In this paper, we attempt to examine whether the use of change-point analysis techniques is appropriate for detecting fatigue based on data captured from wearable sensors. As such, we perform a secondary data analysis to the data generated in: Baghdadi et al., 2018. The reader should note that their raw data was preprocessed using:
Kalman Filter: Used to process the raw data from sensors to: (i) estimate the spatial orientation of the body with respect to the global reference frame, and (ii) to estimate the kinematics of motion.
Segmentation: The motion segments where then segmented using an algorithm that assumes the existence of two peaks in the translational acceleration of the gait cycle. This assumption was justified based on the results of Tongen and Wunderlich, 2010 as well as the authors’ preliminary analyses.
The reader can show any code chunk by clicking on the code button. We chose to make the default for the code hidden since we: (a) wanted to improve the readability of this document; and (b) assumed that the readers will not be interested in reading every code chunk.
The snippet below documents the list of R libraries that were used in this research. For convenience, we used the pacman package since it allows for installing/loading the needed libraries in one step.
rm(list = ls()) # clear global environment
graphics.off() # close all graphics
library(pacman) # needs to be installed first
p_load(R.matlab, plotly, extrafont, grDevices, gridExtra,
dplyr, stringr, tidyverse, utils, reshape2,
anomalize, forecast, MVN, fractal,
ecp, dfphase1)
In the snippet below, we extract the 15 “.mat” files in the GitHub repository (where we loaded the data to allow for the reproduction of our work). Note that these files were originally produced in Baghdadi et al., 2018. Then, we perform several transformation steps: (a) extracting the data for the first three columns in the matlab arrays; and (b) computing three kinematic features from the data corresponding to these columns. Due to the differences between Matlab and R, this requires two nested for loops. The outer loop increments over the number of subjects, while the inner loop increments based on the different number of rows of data for each subject. Please see the comments within the code chunk for more details.
num_subjects <- seq(1, 15)
subject_heights <- c(1.71, 1.77, 1.71, 1.59, 1.69,
1.63, 1.60, 1.71, 1.67, 1.78,
1.68, 1.55, 1.83, 1.81, 1.89)
# Initilizing a df for summary data on participants
summary_df <- data.frame(matrix(nrow = 15, ncol = 9))
colnames(summary_df) <- c("Subject.Num", "num.rows",
"num.cols", "mean.scaled.stride.len",
"sd.scaled.stride.len",
"mean.scaled.stride.height",
"sd.scaled.stride.height",
"mean.stride.duration",
"sd.stride.duartion")
for (i in 1:length(num_subjects)) {
# Reading the .mat files from GitHub
raw_data <- readMat(paste0("https://github.com/fmegahed/fatigue-changepoint/blob/master/Data/Raw/Subject",num_subjects[i],".mat?raw=true"))
# Compute the number of cells, and rows in each structered matrix
raw_data_size <- lengths(raw_data) # num of cells
num_rows <- raw_data_size / 17 # all data had 17 cols
# Initilizing the six lists needed for storing the data (we keep track of the top 3 for error checking)
time_in_sec <- vector("list", length = num_rows)
position_x <- vector("list", length = num_rows)
position_y <- vector("list", length = num_rows)
stride_time <- vector("list", length = num_rows)
stride_length <- vector("list", length = num_rows)
stride_height <- vector("list", length = num_rows)
stride_duration <- vector("list", length = num_rows)
# Following for loop is needed since R reads the structured array as a nested list. The list containing the data is called "M.i.k" and it transforms/reads the original array --> rowise. This means that our first three features (with the same timestamp) are always seperated with a distance equal to the total number of rows
for (j in 1:num_rows) {
position_x[[j]] <- raw_data[["M.i.k"]][[j]]
position_y[[j]] <- raw_data[["M.i.k"]][[num_rows + j]]
stride_time[[j]] <- raw_data[["M.i.k"]][[2 * num_rows + j]]
dataholder <- raw_data[["M.i.k"]][[16 * num_rows + j]] # data holder for time
# Computing the three needed kinematic features
stride_length[[j]] <-
range(position_x[[j]])[2] - range(position_x[[j]])[1]
stride_height[[j]] <-
range(position_y[[j]])[2] - range(position_y[[j]])[1]
stride_duration[[j]] <-
range(stride_time[[j]])[2] - range(stride_time[[j]])[1]
time_in_sec[[j]] <- lapply(dataholder, mean)# using mean time of stride as a time stamp
}
# Scaling and creating one data frame per subject
assign(paste0("subject", i, "_features"),
data.frame(time.from.start = unlist(time_in_sec),
scaled.stride.len = unlist(stride_length)/subject_heights[i],
scaled.stride.height = unlist(stride_height) / subject_heights[i],
stride.duration = unlist(stride_duration)
)
)
# Creating a Summary Data Frame
df_name <- paste0("subject", i, "_features")
summary_df[i, 1] <- paste0("subject", i)
summary_df[i, 2] <- get(df_name) %>% nrow()
summary_df[i, 3] <- get(df_name) %>% ncol()
summary_df[i, 4] <- get(df_name)[, 2] %>% mean() %>% round(digits = 4)
summary_df[i, 5] <- get(df_name)[, 2] %>% sd() %>% round(digits = 4)
summary_df[i, 6] <- get(df_name)[, 3] %>% mean() %>% round(digits = 4)
summary_df[i, 7] <- get(df_name)[, 3] %>% sd() %>% round(digits = 4)
summary_df[i, 8] <- get(df_name)[, 4] %>% mean() %>% round(digits = 4)
summary_df[i, 9] <- get(df_name)[, 4] %>% sd() %>% round(digits = 4)
}
# Printing the top six rows of Subject 4's data as an example
head(subject4_features) %>% round(digits = 3)
# A Summary of the features for all 15 participants
summary_df
rm(raw_data, raw_data_size, i, j, num_rows,
dataholder, num_subjects)
save.image(file = "./Data/RGenerated/FeatureGeneration.RData")
Based on the analysis above, there are three observations to be made. First, we scaled the stride length and height based on the subject’s height. This in essence allows us to capture the steps as a percentage of the person’s height. This reduces the between subject variablity in the data and is supported by the seminal work of Oberg et al., 1993. Second, we showed the first six rows from Subject 4 to provide the reader with some insights into the sampling frequeny (after the preprocessing done in Baghdadi et al. 2018). Note that the kinematic features are computed from the sensor channels provided in their paper. Third, we saved the generated data into an R.data file, which can be accessed by clicking on: FeatureGeneration.RData. We hope that this saved data allows other researchers to reproduce and/or build on our work.
In this task, we examined several approaches for outlier detection. We originally hypothesized that these outliers constitue faulty sensor readings (since they are not sustained), and thus, we can remove this data prior to any further analysis. However, based on a detailed investigation of the data and the application of several outlier detection methods (univariate and multivariate), we have learned that the data is non-normal, highly autocorrelated and non-stationary. Thus, these approaches are not suitable for removing the outliers from our data. Thus, we then investiaged the utilization of “filtering” based methods in Section 3 to impute/correct/smoothen the data.
For the sake of reproducibility and completeness, we provide the results from our implementation of several outlier detection methods in the subsections below. Note that these methods will not be incorporated in our final analysis. They are presented here only to document how we reached our final model/approach.
First, we use a standard line plot to depict the data for each feature by person. For the sake of facilitating the visualization process, we: (a) panel the plot such that the left, center and right panels correspond to the scaled stride length, scaled stride height and step duration, respectively; and (b) we save the results of each participant (P) in a different tab.
# We are reading the data locally (you can download the file from GitHub)
load(file = "./Data/RGenerated/FeatureGeneration.RData")
for (i in 1:15) {
df_transformed <- melt(get(paste0("subject",i,"_features")),
id.vars = "time.from.start",
measure.vars=c("scaled.stride.len",
"scaled.stride.height",
"stride.duration"
)
)
assign(paste0("p",i),
ggplot(data = df_transformed,
aes(x=time.from.start,y=value,group=variable,color=variable,
shape=variable)) +
geom_line() + theme_bw() +
#ylim(0,2) +
ggtitle(paste0("Participant ",i,": Pre-filtered Data")) +
theme(legend.position="none") +
facet_wrap(~variable,nrow=3, scales = "free_y")
)
cat('###',paste0("P", i), "{-}",'\n')
print(get(paste0("p",i)))
cat('\n \n')
stat_printdf <- get(paste0("subject",i,"_features"))
cat(paste0('<source> <p> Based on the <b>raw data</b>, Participant ', i,
' has an average scaled stride length of ',
round(mean(stat_printdf[,2]),2),
', an average scaled stride height of ',
round(mean(stat_printdf[,3]),2),
' and an average stride duration of ', round(mean(stat_printdf[,4]),2),
' seconds. Based on their height of ', subject_heights[i],
' meters, their average stride length and stride height are', ' ',
round(mean(stat_printdf[,2])*subject_heights[i],2),
' and ', ' ', round(mean(stat_printdf[,3])*subject_heights[i],2),
', respectively.', '</p> </source>'
)
)
cat(' \n \n')
}
Based on the raw data, Participant 1 has an average scaled stride length of 1.07, an average scaled stride height of 0.09 and an average stride duration of 1.06 seconds. Based on their height of 1.71 meters, their average stride length and stride height are 1.82 and 0.16, respectively.
Based on the raw data, Participant 2 has an average scaled stride length of 1.63, an average scaled stride height of 0.13 and an average stride duration of 0.95 seconds. Based on their height of 1.77 meters, their average stride length and stride height are 2.88 and 0.23, respectively.
Based on the raw data, Participant 3 has an average scaled stride length of 0.94, an average scaled stride height of 0.09 and an average stride duration of 1.09 seconds. Based on their height of 1.71 meters, their average stride length and stride height are 1.61 and 0.16, respectively.
Based on the raw data, Participant 4 has an average scaled stride length of 1.28, an average scaled stride height of 0.12 and an average stride duration of 0.95 seconds. Based on their height of 1.59 meters, their average stride length and stride height are 2.03 and 0.2, respectively.
Based on the raw data, Participant 5 has an average scaled stride length of 0.96, an average scaled stride height of 0.09 and an average stride duration of 1.09 seconds. Based on their height of 1.69 meters, their average stride length and stride height are 1.63 and 0.15, respectively.
Based on the raw data, Participant 6 has an average scaled stride length of 0.91, an average scaled stride height of 0.11 and an average stride duration of 1.06 seconds. Based on their height of 1.63 meters, their average stride length and stride height are 1.48 and 0.18, respectively.
Based on the raw data, Participant 7 has an average scaled stride length of 0.84, an average scaled stride height of 0.09 and an average stride duration of 1.12 seconds. Based on their height of 1.6 meters, their average stride length and stride height are 1.34 and 0.15, respectively.
Based on the raw data, Participant 8 has an average scaled stride length of 1.53, an average scaled stride height of 0.14 and an average stride duration of 0.98 seconds. Based on their height of 1.71 meters, their average stride length and stride height are 2.62 and 0.24, respectively.
Based on the raw data, Participant 9 has an average scaled stride length of 0.87, an average scaled stride height of 0.1 and an average stride duration of 1.11 seconds. Based on their height of 1.67 meters, their average stride length and stride height are 1.46 and 0.17, respectively.
Based on the raw data, Participant 10 has an average scaled stride length of 1.28, an average scaled stride height of 0.11 and an average stride duration of 1.01 seconds. Based on their height of 1.78 meters, their average stride length and stride height are 2.27 and 0.2, respectively.
Based on the raw data, Participant 11 has an average scaled stride length of 1.2, an average scaled stride height of 0.09 and an average stride duration of 1.05 seconds. Based on their height of 1.68 meters, their average stride length and stride height are 2.02 and 0.15, respectively.
Based on the raw data, Participant 12 has an average scaled stride length of 1.61, an average scaled stride height of 0.14 and an average stride duration of 0.89 seconds. Based on their height of 1.55 meters, their average stride length and stride height are 2.5 and 0.21, respectively.
Based on the raw data, Participant 13 has an average scaled stride length of 0.67, an average scaled stride height of 0.07 and an average stride duration of 1.14 seconds. Based on their height of 1.83 meters, their average stride length and stride height are 1.23 and 0.13, respectively.
Based on the raw data, Participant 14 has an average scaled stride length of 1.32, an average scaled stride height of 0.14 and an average stride duration of 0.93 seconds. Based on their height of 1.81 meters, their average stride length and stride height are 2.39 and 0.25, respectively.
Based on the raw data, Participant 15 has an average scaled stride length of 1.18, an average scaled stride height of 0.1 and an average stride duration of 1.04 seconds. Based on their height of 1.89 meters, their average stride length and stride height are 2.24 and 0.2, respectively.
One of the most commonly used approaches for univariate outliers are defined to be the observations that lie outside \(\small 1.5 * IQR\), where the IQR is the inter quartile range. This can be easily implemented using in base R using the boxplot.stats function. Below, we first provide the output and data plots for each person.
# We are reading the data locally (you can download the file from GitHub)
load(file = "./Data/RGenerated/FeatureGeneration.RData")
outliers_bp <- list() # initilizing a list to store outliers per participant
for (i in 1:15) {
df <- get(paste0("subject",i,"_features")) # Getting Data from Sec.
# Obtaining the Outliers for each of the three variables
out_vals_sl<- boxplot.stats(df[,2])$out
out_rows_sl <- which(df[,2] %in% out_vals_sl)
out_vals_sh<- boxplot.stats(df[,3])$out
out_rows_sh <- which(df[,3] %in% out_vals_sh)
out_vals_sd<- boxplot.stats(df[,4])$out
out_rows_sd <- which(df[,4] %in% out_vals_sd)
# Generating a union set of all obs. that have outliers
# True: if any of the 3 vars for that obs. is an outlier
outliers_total <- unique(c(out_rows_sl, out_rows_sh,
out_rows_sd))
outliers_bp[[i]] <- outliers_total # saving it to list indexed by participant number
# Remove the observations corresponding to the outliers
assign(paste0("subject",i,"_bp"),
df[-outliers_total,])
# Preparing the data for the Line Graph
df_transformed <- melt(get(paste0("subject",i,"_bp")),
id.vars = "time.from.start",
measure.vars=c("scaled.stride.len",
"scaled.stride.height",
"stride.duration"
)
) # ggplot data needs to be tall
assign(paste0("g",i),
ggplot(data = df_transformed,
aes(x=time.from.start, y=value, group=variable,
color=variable,shape=variable)) +
geom_line() + theme_bw() +
#ylim(0,2) +
ggtitle("Outliers removed via the standard boxplot method \n (any point outside of 1.5*IQR was removed)") +
theme(legend.position="none",
#axis.text.x=element_text(angle=90,hjust=1),
plot.title = element_text(hjust = 0.5)) +
facet_wrap(~variable,nrow=3, scales = 'free_y')
)
cat('####',paste0("P", i), "{-}",'\n')
# Printing the % of outliers removed
cat("\n")
num_rows <- nrow(get(paste0("subject",i,"_features")))
num_outliers <- length(outliers_total)
percent_data <- round(100*num_outliers/num_rows,2)
cat("<b>",paste0("% of removed Observations for Partcipant ",i,
": ", percent_data,"%."), "</b>",
"This removes any value outside 1.5 * IQR for any of the 3 features. From the code, it should be clear that we are implementing this columnwise, starting with stride length and ending with stride duration.")
cat('\n \n')
# Plotting the data without the outliers
print(get(paste0("g",i)))
cat('\n \n')
stat_printdf <- get(paste0("subject",i,"_bp"))
cat(paste0('<source> <p> After applying the <b>boxplot outlier detection methodology</b>, Participant ', i,
' has an average scaled stride length of ',
round(mean(stat_printdf[,2]),2),
', an average scaled stride height of ',
round(mean(stat_printdf[,3]),2),
' and an average stride duration of ', round(mean(stat_printdf[,4]),2),
' seconds. Based on their height of ', subject_heights[i],
' meters, their average stride length and stride height are', ' ',
round(mean(stat_printdf[,2])*subject_heights[i],2),
' and ', ' ', round(mean(stat_printdf[,3])*subject_heights[i],2),
', respectively.', '</p> </source>'
)
)
cat('\n')
cat('<source> <p> Note that we could have tried making the value of the coefficient multiplied by the IQR coefficient larger (e.g., both the <i>tsoutlier() function</i> in the <a href="https://cran.r-project.org/web/packages/forecast/forecast.pdf">forecast package</a> and the <i> iqr() function </i> in the <a href="https://cran.r-project.org/web/packages/anomalize/anomalize.pdf">anomalize package</a> use a value of 3 for the coefficient); however, we did not explore this option since this: (a) would have possibly required a large amount of trial-and-error, and (b) the approach would be difficult to justify. </p> </source>')
cat(' \n \n')
}
% of removed Observations for Partcipant 1: 2.13%. This removes any value outside 1.5 * IQR for any of the 3 features. From the code, it should be clear that we are implementing this columnwise, starting with stride length and ending with stride duration.
After applying the boxplot outlier detection methodology, Participant 1 has an average scaled stride length of 1.07, an average scaled stride height of 0.09 and an average stride duration of 1.06 seconds. Based on their height of 1.71 meters, their average stride length and stride height are 1.83 and 0.16, respectively.
Note that we could have tried making the value of the coefficient multiplied by the IQR coefficient larger (e.g., both the tsoutlier() function in the forecast package and the iqr() function in the anomalize package use a value of 3 for the coefficient); however, we did not explore this option since this: (a) would have possibly required a large amount of trial-and-error, and (b) the approach would be difficult to justify.
% of removed Observations for Partcipant 2: 0.51%. This removes any value outside 1.5 * IQR for any of the 3 features. From the code, it should be clear that we are implementing this columnwise, starting with stride length and ending with stride duration.
After applying the boxplot outlier detection methodology, Participant 2 has an average scaled stride length of 1.63, an average scaled stride height of 0.13 and an average stride duration of 0.95 seconds. Based on their height of 1.77 meters, their average stride length and stride height are 2.89 and 0.23, respectively.
Note that we could have tried making the value of the coefficient multiplied by the IQR coefficient larger (e.g., both the tsoutlier() function in the forecast package and the iqr() function in the anomalize package use a value of 3 for the coefficient); however, we did not explore this option since this: (a) would have possibly required a large amount of trial-and-error, and (b) the approach would be difficult to justify.
% of removed Observations for Partcipant 3: 6.1%. This removes any value outside 1.5 * IQR for any of the 3 features. From the code, it should be clear that we are implementing this columnwise, starting with stride length and ending with stride duration.
After applying the boxplot outlier detection methodology, Participant 3 has an average scaled stride length of 0.94, an average scaled stride height of 0.09 and an average stride duration of 1.09 seconds. Based on their height of 1.71 meters, their average stride length and stride height are 1.61 and 0.15, respectively.
Note that we could have tried making the value of the coefficient multiplied by the IQR coefficient larger (e.g., both the tsoutlier() function in the forecast package and the iqr() function in the anomalize package use a value of 3 for the coefficient); however, we did not explore this option since this: (a) would have possibly required a large amount of trial-and-error, and (b) the approach would be difficult to justify.
% of removed Observations for Partcipant 4: 8.93%. This removes any value outside 1.5 * IQR for any of the 3 features. From the code, it should be clear that we are implementing this columnwise, starting with stride length and ending with stride duration.
After applying the boxplot outlier detection methodology, Participant 4 has an average scaled stride length of 1.32, an average scaled stride height of 0.12 and an average stride duration of 0.94 seconds. Based on their height of 1.59 meters, their average stride length and stride height are 2.1 and 0.2, respectively.
Note that we could have tried making the value of the coefficient multiplied by the IQR coefficient larger (e.g., both the tsoutlier() function in the forecast package and the iqr() function in the anomalize package use a value of 3 for the coefficient); however, we did not explore this option since this: (a) would have possibly required a large amount of trial-and-error, and (b) the approach would be difficult to justify.
% of removed Observations for Partcipant 5: 7.61%. This removes any value outside 1.5 * IQR for any of the 3 features. From the code, it should be clear that we are implementing this columnwise, starting with stride length and ending with stride duration.
After applying the boxplot outlier detection methodology, Participant 5 has an average scaled stride length of 0.99, an average scaled stride height of 0.09 and an average stride duration of 1.09 seconds. Based on their height of 1.69 meters, their average stride length and stride height are 1.66 and 0.15, respectively.
Note that we could have tried making the value of the coefficient multiplied by the IQR coefficient larger (e.g., both the tsoutlier() function in the forecast package and the iqr() function in the anomalize package use a value of 3 for the coefficient); however, we did not explore this option since this: (a) would have possibly required a large amount of trial-and-error, and (b) the approach would be difficult to justify.
% of removed Observations for Partcipant 6: 5.01%. This removes any value outside 1.5 * IQR for any of the 3 features. From the code, it should be clear that we are implementing this columnwise, starting with stride length and ending with stride duration.
After applying the boxplot outlier detection methodology, Participant 6 has an average scaled stride length of 0.91, an average scaled stride height of 0.11 and an average stride duration of 1.06 seconds. Based on their height of 1.63 meters, their average stride length and stride height are 1.49 and 0.18, respectively.
Note that we could have tried making the value of the coefficient multiplied by the IQR coefficient larger (e.g., both the tsoutlier() function in the forecast package and the iqr() function in the anomalize package use a value of 3 for the coefficient); however, we did not explore this option since this: (a) would have possibly required a large amount of trial-and-error, and (b) the approach would be difficult to justify.
% of removed Observations for Partcipant 7: 10.85%. This removes any value outside 1.5 * IQR for any of the 3 features. From the code, it should be clear that we are implementing this columnwise, starting with stride length and ending with stride duration.
After applying the boxplot outlier detection methodology, Participant 7 has an average scaled stride length of 0.85, an average scaled stride height of 0.09 and an average stride duration of 1.14 seconds. Based on their height of 1.6 meters, their average stride length and stride height are 1.36 and 0.15, respectively.
Note that we could have tried making the value of the coefficient multiplied by the IQR coefficient larger (e.g., both the tsoutlier() function in the forecast package and the iqr() function in the anomalize package use a value of 3 for the coefficient); however, we did not explore this option since this: (a) would have possibly required a large amount of trial-and-error, and (b) the approach would be difficult to justify.
% of removed Observations for Partcipant 8: 11.2%. This removes any value outside 1.5 * IQR for any of the 3 features. From the code, it should be clear that we are implementing this columnwise, starting with stride length and ending with stride duration.
After applying the boxplot outlier detection methodology, Participant 8 has an average scaled stride length of 1.6, an average scaled stride height of 0.14 and an average stride duration of 0.96 seconds. Based on their height of 1.71 meters, their average stride length and stride height are 2.74 and 0.24, respectively.
Note that we could have tried making the value of the coefficient multiplied by the IQR coefficient larger (e.g., both the tsoutlier() function in the forecast package and the iqr() function in the anomalize package use a value of 3 for the coefficient); however, we did not explore this option since this: (a) would have possibly required a large amount of trial-and-error, and (b) the approach would be difficult to justify.
% of removed Observations for Partcipant 9: 8.76%. This removes any value outside 1.5 * IQR for any of the 3 features. From the code, it should be clear that we are implementing this columnwise, starting with stride length and ending with stride duration.
After applying the boxplot outlier detection methodology, Participant 9 has an average scaled stride length of 0.86, an average scaled stride height of 0.1 and an average stride duration of 1.12 seconds. Based on their height of 1.67 meters, their average stride length and stride height are 1.44 and 0.17, respectively.
Note that we could have tried making the value of the coefficient multiplied by the IQR coefficient larger (e.g., both the tsoutlier() function in the forecast package and the iqr() function in the anomalize package use a value of 3 for the coefficient); however, we did not explore this option since this: (a) would have possibly required a large amount of trial-and-error, and (b) the approach would be difficult to justify.
% of removed Observations for Partcipant 10: 7.32%. This removes any value outside 1.5 * IQR for any of the 3 features. From the code, it should be clear that we are implementing this columnwise, starting with stride length and ending with stride duration.
After applying the boxplot outlier detection methodology, Participant 10 has an average scaled stride length of 1.33, an average scaled stride height of 0.11 and an average stride duration of 1 seconds. Based on their height of 1.78 meters, their average stride length and stride height are 2.36 and 0.19, respectively.
Note that we could have tried making the value of the coefficient multiplied by the IQR coefficient larger (e.g., both the tsoutlier() function in the forecast package and the iqr() function in the anomalize package use a value of 3 for the coefficient); however, we did not explore this option since this: (a) would have possibly required a large amount of trial-and-error, and (b) the approach would be difficult to justify.
% of removed Observations for Partcipant 11: 8.81%. This removes any value outside 1.5 * IQR for any of the 3 features. From the code, it should be clear that we are implementing this columnwise, starting with stride length and ending with stride duration.
After applying the boxplot outlier detection methodology, Participant 11 has an average scaled stride length of 1.22, an average scaled stride height of 0.09 and an average stride duration of 1.05 seconds. Based on their height of 1.68 meters, their average stride length and stride height are 2.06 and 0.15, respectively.
Note that we could have tried making the value of the coefficient multiplied by the IQR coefficient larger (e.g., both the tsoutlier() function in the forecast package and the iqr() function in the anomalize package use a value of 3 for the coefficient); however, we did not explore this option since this: (a) would have possibly required a large amount of trial-and-error, and (b) the approach would be difficult to justify.
% of removed Observations for Partcipant 12: 10.86%. This removes any value outside 1.5 * IQR for any of the 3 features. From the code, it should be clear that we are implementing this columnwise, starting with stride length and ending with stride duration.
After applying the boxplot outlier detection methodology, Participant 12 has an average scaled stride length of 1.68, an average scaled stride height of 0.14 and an average stride duration of 0.88 seconds. Based on their height of 1.55 meters, their average stride length and stride height are 2.61 and 0.21, respectively.
Note that we could have tried making the value of the coefficient multiplied by the IQR coefficient larger (e.g., both the tsoutlier() function in the forecast package and the iqr() function in the anomalize package use a value of 3 for the coefficient); however, we did not explore this option since this: (a) would have possibly required a large amount of trial-and-error, and (b) the approach would be difficult to justify.
% of removed Observations for Partcipant 13: 7.74%. This removes any value outside 1.5 * IQR for any of the 3 features. From the code, it should be clear that we are implementing this columnwise, starting with stride length and ending with stride duration.
After applying the boxplot outlier detection methodology, Participant 13 has an average scaled stride length of 0.68, an average scaled stride height of 0.07 and an average stride duration of 1.15 seconds. Based on their height of 1.83 meters, their average stride length and stride height are 1.25 and 0.13, respectively.
Note that we could have tried making the value of the coefficient multiplied by the IQR coefficient larger (e.g., both the tsoutlier() function in the forecast package and the iqr() function in the anomalize package use a value of 3 for the coefficient); however, we did not explore this option since this: (a) would have possibly required a large amount of trial-and-error, and (b) the approach would be difficult to justify.
% of removed Observations for Partcipant 14: 10.93%. This removes any value outside 1.5 * IQR for any of the 3 features. From the code, it should be clear that we are implementing this columnwise, starting with stride length and ending with stride duration.
After applying the boxplot outlier detection methodology, Participant 14 has an average scaled stride length of 1.39, an average scaled stride height of 0.14 and an average stride duration of 0.91 seconds. Based on their height of 1.81 meters, their average stride length and stride height are 2.51 and 0.25, respectively.
Note that we could have tried making the value of the coefficient multiplied by the IQR coefficient larger (e.g., both the tsoutlier() function in the forecast package and the iqr() function in the anomalize package use a value of 3 for the coefficient); however, we did not explore this option since this: (a) would have possibly required a large amount of trial-and-error, and (b) the approach would be difficult to justify.
% of removed Observations for Partcipant 15: 4.98%. This removes any value outside 1.5 * IQR for any of the 3 features. From the code, it should be clear that we are implementing this columnwise, starting with stride length and ending with stride duration.
After applying the boxplot outlier detection methodology, Participant 15 has an average scaled stride length of 1.2, an average scaled stride height of 0.1 and an average stride duration of 1.04 seconds. Based on their height of 1.89 meters, their average stride length and stride height are 2.27 and 0.19, respectively.
Note that we could have tried making the value of the coefficient multiplied by the IQR coefficient larger (e.g., both the tsoutlier() function in the forecast package and the iqr() function in the anomalize package use a value of 3 for the coefficient); however, we did not explore this option since this: (a) would have possibly required a large amount of trial-and-error, and (b) the approach would be difficult to justify.
# Saving a list of cleaned
save(subject1_bp, subject2_bp, subject3_bp, subject4_bp,
subject5_bp, subject6_bp, subject7_bp, subject8_bp,
subject9_bp, subject10_bp, subject11_bp, subject12_bp,
subject13_bp, subject14_bp, subject15_bp, outliers_bp,
file = "./Data/RGenerated/OutliersRemovedBoxplot.RData")
# We are reading the data locally (you can download the file from GitHub)
load(file = "./Data/RGenerated/FeatureGeneration.RData")
mal_results <- {}
for (i in 1:15) {
df <- get(paste0("subject",i,"_features"))
df <- df[,2:4]#only taking the four numeric fields
cat('####',paste0("P", i), "{-}",' \n')
cat('The output from the <source> <i> MVN package </i> </source> is shown below. Note that we depict the multivariate normal distribution for the three features of each distribution below. </source> <b> Essentially, the goal here is two-fold. First, we would like to examine if the joint distribution of the three kinematic features can be modeled using the multivariate normal distribution. Second, if the first hypothesis is true, then we can use the Mahalanobis Distance to identify outliers in each observation. </b> </source>')
plot.new()
mvn_test <- mvn(df, mvnTest = "dh",desc = TRUE,
univariateTest = "Lillie",
multivariateOutlierMethod = "adj",
showOutliers = TRUE) #adjusted Mahl Distance
cat(' \n')
cat(paste0('In addition to the plot, we provide the statistics of the Doornik-Hansen test for Participant ',i,' below. \n \n'))
mvn_Normality <- mvn_test$multivariateNormality
rownames(mvn_Normality) <- paste('Participant',i)
tab <- xtable::xtable(mvn_test$multivariateNormality,
align = c("c","c","c","c","c","c"))
print(tab, type= 'html',
html.table.attributes = 'align="center", rules="rows",
width=50%, frame="below"')
cat('<source> <p> \n </p> </source>')
cat(paste0('Based on the MVN test and the plot above, it is clear that the data is not normally distributed. Thus, an outlier detection/removal algorithm based on a MVN distribution will be inappropriate. This observation is true to not only Participant ', i, ' but to also all the other participants. Thus, in the following section we will investigate method(s) based on the signal processing literature detection literature.'))
cat(' \n \n')
}
The output from the In addition to the plot, we provide the statistics of the Doornik-Hansen test for Participant 1 below.
| Test | E | df | p value | MVN | |
|---|---|---|---|---|---|
| 1 | Doornik-Hansen | 14677.13 | 6.00 | 0.00 | NO |
Based on the MVN test and the plot above, it is clear that the data is not normally distributed. Thus, an outlier detection/removal algorithm based on a MVN distribution will be inappropriate. This observation is true to not only Participant 1 but to also all the other participants. Thus, in the following section we will investigate method(s) based on the signal processing literature detection literature.
The output from the In addition to the plot, we provide the statistics of the Doornik-Hansen test for Participant 2 below.
| Test | E | df | p value | MVN | |
|---|---|---|---|---|---|
| 1 | Doornik-Hansen | 7021.46 | 6.00 | 0.00 | NO |
Based on the MVN test and the plot above, it is clear that the data is not normally distributed. Thus, an outlier detection/removal algorithm based on a MVN distribution will be inappropriate. This observation is true to not only Participant 2 but to also all the other participants. Thus, in the following section we will investigate method(s) based on the signal processing literature detection literature.
The output from the In addition to the plot, we provide the statistics of the Doornik-Hansen test for Participant 3 below.
| Test | E | df | p value | MVN | |
|---|---|---|---|---|---|
| 1 | Doornik-Hansen | 2778.83 | 6.00 | 0.00 | NO |
Based on the MVN test and the plot above, it is clear that the data is not normally distributed. Thus, an outlier detection/removal algorithm based on a MVN distribution will be inappropriate. This observation is true to not only Participant 3 but to also all the other participants. Thus, in the following section we will investigate method(s) based on the signal processing literature detection literature.
The output from the In addition to the plot, we provide the statistics of the Doornik-Hansen test for Participant 4 below.
| Test | E | df | p value | MVN | |
|---|---|---|---|---|---|
| 1 | Doornik-Hansen | 10446.56 | 6.00 | 0.00 | NO |
Based on the MVN test and the plot above, it is clear that the data is not normally distributed. Thus, an outlier detection/removal algorithm based on a MVN distribution will be inappropriate. This observation is true to not only Participant 4 but to also all the other participants. Thus, in the following section we will investigate method(s) based on the signal processing literature detection literature.
The output from the In addition to the plot, we provide the statistics of the Doornik-Hansen test for Participant 5 below.
| Test | E | df | p value | MVN | |
|---|---|---|---|---|---|
| 1 | Doornik-Hansen | 6516.14 | 6.00 | 0.00 | NO |
Based on the MVN test and the plot above, it is clear that the data is not normally distributed. Thus, an outlier detection/removal algorithm based on a MVN distribution will be inappropriate. This observation is true to not only Participant 5 but to also all the other participants. Thus, in the following section we will investigate method(s) based on the signal processing literature detection literature.
The output from the In addition to the plot, we provide the statistics of the Doornik-Hansen test for Participant 6 below.
| Test | E | df | p value | MVN | |
|---|---|---|---|---|---|
| 1 | Doornik-Hansen | 15389.84 | 6.00 | 0.00 | NO |
Based on the MVN test and the plot above, it is clear that the data is not normally distributed. Thus, an outlier detection/removal algorithm based on a MVN distribution will be inappropriate. This observation is true to not only Participant 6 but to also all the other participants. Thus, in the following section we will investigate method(s) based on the signal processing literature detection literature.
The output from the In addition to the plot, we provide the statistics of the Doornik-Hansen test for Participant 7 below.
| Test | E | df | p value | MVN | |
|---|---|---|---|---|---|
| 1 | Doornik-Hansen | 3109.05 | 6.00 | 0.00 | NO |
Based on the MVN test and the plot above, it is clear that the data is not normally distributed. Thus, an outlier detection/removal algorithm based on a MVN distribution will be inappropriate. This observation is true to not only Participant 7 but to also all the other participants. Thus, in the following section we will investigate method(s) based on the signal processing literature detection literature.
The output from the In addition to the plot, we provide the statistics of the Doornik-Hansen test for Participant 8 below.
| Test | E | df | p value | MVN | |
|---|---|---|---|---|---|
| 1 | Doornik-Hansen | 6399.80 | 6.00 | 0.00 | NO |
Based on the MVN test and the plot above, it is clear that the data is not normally distributed. Thus, an outlier detection/removal algorithm based on a MVN distribution will be inappropriate. This observation is true to not only Participant 8 but to also all the other participants. Thus, in the following section we will investigate method(s) based on the signal processing literature detection literature.
The output from the In addition to the plot, we provide the statistics of the Doornik-Hansen test for Participant 9 below.
| Test | E | df | p value | MVN | |
|---|---|---|---|---|---|
| 1 | Doornik-Hansen | 5815.28 | 6.00 | 0.00 | NO |
Based on the MVN test and the plot above, it is clear that the data is not normally distributed. Thus, an outlier detection/removal algorithm based on a MVN distribution will be inappropriate. This observation is true to not only Participant 9 but to also all the other participants. Thus, in the following section we will investigate method(s) based on the signal processing literature detection literature.
The output from the In addition to the plot, we provide the statistics of the Doornik-Hansen test for Participant 10 below.
| Test | E | df | p value | MVN | |
|---|---|---|---|---|---|
| 1 | Doornik-Hansen | 14166.64 | 6.00 | 0.00 | NO |
Based on the MVN test and the plot above, it is clear that the data is not normally distributed. Thus, an outlier detection/removal algorithm based on a MVN distribution will be inappropriate. This observation is true to not only Participant 10 but to also all the other participants. Thus, in the following section we will investigate method(s) based on the signal processing literature detection literature.
The output from the In addition to the plot, we provide the statistics of the Doornik-Hansen test for Participant 11 below.
| Test | E | df | p value | MVN | |
|---|---|---|---|---|---|
| 1 | Doornik-Hansen | 15132.69 | 6.00 | 0.00 | NO |
Based on the MVN test and the plot above, it is clear that the data is not normally distributed. Thus, an outlier detection/removal algorithm based on a MVN distribution will be inappropriate. This observation is true to not only Participant 11 but to also all the other participants. Thus, in the following section we will investigate method(s) based on the signal processing literature detection literature.
The output from the In addition to the plot, we provide the statistics of the Doornik-Hansen test for Participant 12 below.
| Test | E | df | p value | MVN | |
|---|---|---|---|---|---|
| 1 | Doornik-Hansen | 8902.87 | 6.00 | 0.00 | NO |
Based on the MVN test and the plot above, it is clear that the data is not normally distributed. Thus, an outlier detection/removal algorithm based on a MVN distribution will be inappropriate. This observation is true to not only Participant 12 but to also all the other participants. Thus, in the following section we will investigate method(s) based on the signal processing literature detection literature.
The output from the In addition to the plot, we provide the statistics of the Doornik-Hansen test for Participant 13 below.
| Test | E | df | p value | MVN | |
|---|---|---|---|---|---|
| 1 | Doornik-Hansen | 6561.33 | 6.00 | 0.00 | NO |
Based on the MVN test and the plot above, it is clear that the data is not normally distributed. Thus, an outlier detection/removal algorithm based on a MVN distribution will be inappropriate. This observation is true to not only Participant 13 but to also all the other participants. Thus, in the following section we will investigate method(s) based on the signal processing literature detection literature.
The output from the In addition to the plot, we provide the statistics of the Doornik-Hansen test for Participant 14 below.
| Test | E | df | p value | MVN | |
|---|---|---|---|---|---|
| 1 | Doornik-Hansen | 7265.08 | 6.00 | 0.00 | NO |
Based on the MVN test and the plot above, it is clear that the data is not normally distributed. Thus, an outlier detection/removal algorithm based on a MVN distribution will be inappropriate. This observation is true to not only Participant 14 but to also all the other participants. Thus, in the following section we will investigate method(s) based on the signal processing literature detection literature.
The output from the In addition to the plot, we provide the statistics of the Doornik-Hansen test for Participant 15 below.
| Test | E | df | p value | MVN | |
|---|---|---|---|---|---|
| 1 | Doornik-Hansen | 22184.55 | 6.00 | 0.00 | NO |
Based on the MVN test and the plot above, it is clear that the data is not normally distributed. Thus, an outlier detection/removal algorithm based on a MVN distribution will be inappropriate. This observation is true to not only Participant 15 but to also all the other participants. Thus, in the following section we will investigate method(s) based on the signal processing literature detection literature.
An alternative approach to detecting and then removing outliers is to use smoothing techniques to “correct” the data. Based on the results shown in the previous sections, the fundamental hypotheses behind this approach are:
The sensors’ data seem to be non-stationary, autocorrelated and somewhat complex (from a statistical perspective). More importantly, there are spikes in the data that cannot be explained by the kinematic model. These spikes produce values that are unrealistic (e.g., a scaled stride length > 1 is not possible since that means that the stride of the person is larger than their height). While this insight is useful, there is no hard limit that we can enforce since the literature reports mean values across the population instead of attempting to provide a physical threshold on what is possible.
The outliers tended to appear in pairs of two; indicating a possible issue in the segmentation of the data. Note that we label this data as outliers since it is not sustained (i.e., it is unlikely to say that a person is fatigued for only two stride).
Smoothing (i.e. a low pass filter on the data) will allow us to impute values for every stride (after an initial size window). This will allow us to preserve the successive nature of the strides, “fix” the outliers based on neighboring strides, and not throw away any observation.
An algorithmic approach based on these observations is implemented below. Note that we are using the median filter from the fractal package on CRAN. Note that in the analysis below, we are applying the median Filter, with n=29 (i.e., the median is calculated based on a overlaping moving window of size 29 strides). We chose this window size based on the recommendation of our biomechanics expert as it allows us to: (a) ignore the effect of any turns in the direction of motion; and (b) the corresponding window size is approximately 30 seconds long (which presents an easy to remember rule for practical application).
# We are reading the data locally (you can download the file from GitHub)
load(file = "./Data/RGenerated/FeatureGeneration.RData")
for (i in 1:15) {
df <- get(paste0("subject",i,"_features")) # Getting Data from Sec.
# Obtaining the Outliers for each of the three variables
time.from.start <- df[,1]
scaled.stride.len <- medianFilter(df[,2], order = 29) # moving window of size 29 obs
scaled.stride.height <- medianFilter(df[,3], order = 29) # moving window of size 29 obs
stride.duration <- medianFilter(df[,4], order = 29) # moving window of size 29 obs
# Computing the first difference to make seein the values easier
diff.time.from.start <- time.from.start[-1]
diff.scaled.stride.len <- diff(scaled.stride.len)
diff.scaled.stride.height <- diff(scaled.stride.height)
diff.stride.duration <- diff(stride.duration)
# Remove the observations corresponding to the outliers
assign(paste0("subject",i,"_medianf"),
data.frame(cbind(time.from.start, scaled.stride.len,
scaled.stride.height, stride.duration)))
assign(paste0("subject",i,"_median_diff_f"),
data.frame(cbind( diff.time.from.start, diff.scaled.stride.len,
diff.scaled.stride.height, diff.stride.duration)))
# Preparing the data for the Line Graph
df_transformed <- melt(get(paste0("subject",i,"_medianf")),
id.vars = "time.from.start",
measure.vars=c("scaled.stride.len",
"scaled.stride.height",
"stride.duration"
)
) # ggplot data needs to be tall
diff_df_transformed <- melt(get(paste0("subject",i,"_median_diff_f")),
id.vars = "diff.time.from.start",
measure.vars=c("diff.scaled.stride.len",
"diff.scaled.stride.height",
"diff.stride.duration"
)
) # ggplot data needs to be tall
assign(paste0("g",i),
ggplot(data = df_transformed,
aes(x=time.from.start, y=value, group=variable,
color=variable,shape=variable)) +
geom_line() + theme_bw() +
ggtitle("Data post the application of the median filter") +
theme(legend.position="none",
#axis.text.x=element_text(angle=90,hjust=1),
plot.title = element_text(hjust = 0.5)) +
facet_wrap(~variable,nrow=3, scales = "free_y")
)
assign(paste0("h",i),
ggplot(data = diff_df_transformed,
aes(x=diff.time.from.start, y=value, group=variable,
color=variable,shape=variable)) +
geom_line() + theme_bw() +
ggtitle("First Difference of the Signal post the median filter") +
theme(legend.position="none",
#axis.text.x=element_text(angle=90,hjust=1),
plot.title = element_text(hjust = 0.5)) +
facet_wrap(~variable,nrow=3, scales = "free_y")
)
cat('###',paste0("P", i), "{-}",'\n')
print(get(paste0("g",i)))
cat('\n')
cat(paste0('<source> <p> Based on the <b>median filter</b>, Participant ', i,
' has an average scaled stride length of ',
round(mean(scaled.stride.len),2),
', an average scaled stride height of ',
round(mean(scaled.stride.height),2),
' and an average stride duration of ', round(mean(stride.duration),2),
' seconds. Based on their height of ', subject_heights[i],
' meters, their average stride length and stride height are', ' ',
round(mean(scaled.stride.len)*subject_heights[i],2),
' and ', ' ', round(mean(scaled.stride.height)*subject_heights[i],2),
', respectively.', '</p> </source>'
)
)
cat('\n')
print(get(paste0("h",i)))
cat(paste0('<source> <p> Based on the <b>median filter </b>the interquarile range of the <b>first difference </b> of Participant ', i,
'\'s scaled stride length, scaled stride height and stride duration are',
' ',
round(IQR(diff.scaled.stride.len),3),
', ', round(IQR(diff.scaled.stride.height),3),
', and ', round(IQR(diff.stride.duration),3),
', respectively. </p> </source>'
)
)
cat(' \n \n')
cat('<source> <p> The data processed with the median filtered approach is stored at the <a href="https://github.com/fmegahed/fatigue-changepoint/blob/master/Data/RGenerated/MedianFilteredData.RData">following location</a> in our GitHub Repository.')
cat('\n \n')
}
Based on the median filter, Participant 1 has an average scaled stride length of 1.08, an average scaled stride height of 0.09 and an average stride duration of 1.06 seconds. Based on their height of 1.71 meters, their average stride length and stride height are 1.84 and 0.16, respectively.
Based on the median filter the interquarile range of the first difference of Participant 1’s scaled stride length, scaled stride height and stride duration are 0.001, 0, and 0, respectively.
The data processed with the median filtered approach is stored at the following location in our GitHub Repository.
Based on the median filter, Participant 2 has an average scaled stride length of 1.65, an average scaled stride height of 0.13 and an average stride duration of 0.94 seconds. Based on their height of 1.77 meters, their average stride length and stride height are 2.91 and 0.24, respectively.
Based on the median filter the interquarile range of the first difference of Participant 2’s scaled stride length, scaled stride height and stride duration are 0.001, 0, and 0, respectively.
The data processed with the median filtered approach is stored at the following location in our GitHub Repository.
Based on the median filter, Participant 3 has an average scaled stride length of 0.94, an average scaled stride height of 0.09 and an average stride duration of 1.08 seconds. Based on their height of 1.71 meters, their average stride length and stride height are 1.61 and 0.15, respectively.
Based on the median filter the interquarile range of the first difference of Participant 3’s scaled stride length, scaled stride height and stride duration are 0.003, 0, and 0, respectively.
The data processed with the median filtered approach is stored at the following location in our GitHub Repository.
Based on the median filter, Participant 4 has an average scaled stride length of 1.31, an average scaled stride height of 0.13 and an average stride duration of 0.93 seconds. Based on their height of 1.59 meters, their average stride length and stride height are 2.09 and 0.2, respectively.
Based on the median filter the interquarile range of the first difference of Participant 4’s scaled stride length, scaled stride height and stride duration are 0.002, 0, and 0, respectively.
The data processed with the median filtered approach is stored at the following location in our GitHub Repository.
Based on the median filter, Participant 5 has an average scaled stride length of 0.98, an average scaled stride height of 0.09 and an average stride duration of 1.09 seconds. Based on their height of 1.69 meters, their average stride length and stride height are 1.66 and 0.15, respectively.
Based on the median filter the interquarile range of the first difference of Participant 5’s scaled stride length, scaled stride height and stride duration are 0.002, 0, and 0, respectively.
The data processed with the median filtered approach is stored at the following location in our GitHub Repository.
Based on the median filter, Participant 6 has an average scaled stride length of 0.92, an average scaled stride height of 0.11 and an average stride duration of 1.05 seconds. Based on their height of 1.63 meters, their average stride length and stride height are 1.5 and 0.17, respectively.
Based on the median filter the interquarile range of the first difference of Participant 6’s scaled stride length, scaled stride height and stride duration are 0.002, 0, and 0, respectively.
The data processed with the median filtered approach is stored at the following location in our GitHub Repository.
Based on the median filter, Participant 7 has an average scaled stride length of 0.85, an average scaled stride height of 0.09 and an average stride duration of 1.13 seconds. Based on their height of 1.6 meters, their average stride length and stride height are 1.36 and 0.15, respectively.
Based on the median filter the interquarile range of the first difference of Participant 7’s scaled stride length, scaled stride height and stride duration are 0.001, 0, and 0, respectively.
The data processed with the median filtered approach is stored at the following location in our GitHub Repository.
Based on the median filter, Participant 8 has an average scaled stride length of 1.59, an average scaled stride height of 0.14 and an average stride duration of 0.96 seconds. Based on their height of 1.71 meters, their average stride length and stride height are 2.72 and 0.24, respectively.
Based on the median filter the interquarile range of the first difference of Participant 8’s scaled stride length, scaled stride height and stride duration are 0.004, 0, and 0, respectively.
The data processed with the median filtered approach is stored at the following location in our GitHub Repository.
Based on the median filter, Participant 9 has an average scaled stride length of 0.87, an average scaled stride height of 0.1 and an average stride duration of 1.11 seconds. Based on their height of 1.67 meters, their average stride length and stride height are 1.46 and 0.17, respectively.
Based on the median filter the interquarile range of the first difference of Participant 9’s scaled stride length, scaled stride height and stride duration are 0.002, 0, and 0, respectively.
The data processed with the median filtered approach is stored at the following location in our GitHub Repository.
Based on the median filter, Participant 10 has an average scaled stride length of 1.34, an average scaled stride height of 0.11 and an average stride duration of 0.99 seconds. Based on their height of 1.78 meters, their average stride length and stride height are 2.38 and 0.2, respectively.
Based on the median filter the interquarile range of the first difference of Participant 10’s scaled stride length, scaled stride height and stride duration are 0.003, 0, and 0, respectively.
The data processed with the median filtered approach is stored at the following location in our GitHub Repository.
Based on the median filter, Participant 11 has an average scaled stride length of 1.22, an average scaled stride height of 0.09 and an average stride duration of 1.05 seconds. Based on their height of 1.68 meters, their average stride length and stride height are 2.04 and 0.15, respectively.
Based on the median filter the interquarile range of the first difference of Participant 11’s scaled stride length, scaled stride height and stride duration are 0.002, 0, and 0, respectively.
The data processed with the median filtered approach is stored at the following location in our GitHub Repository.
Based on the median filter, Participant 12 has an average scaled stride length of 1.64, an average scaled stride height of 0.14 and an average stride duration of 0.88 seconds. Based on their height of 1.55 meters, their average stride length and stride height are 2.55 and 0.22, respectively.
Based on the median filter the interquarile range of the first difference of Participant 12’s scaled stride length, scaled stride height and stride duration are 0.001, 0, and 0, respectively.
The data processed with the median filtered approach is stored at the following location in our GitHub Repository.
Based on the median filter, Participant 13 has an average scaled stride length of 0.68, an average scaled stride height of 0.07 and an average stride duration of 1.15 seconds. Based on their height of 1.83 meters, their average stride length and stride height are 1.24 and 0.13, respectively.
Based on the median filter the interquarile range of the first difference of Participant 13’s scaled stride length, scaled stride height and stride duration are 0.002, 0, and 0, respectively.
The data processed with the median filtered approach is stored at the following location in our GitHub Repository.
Based on the median filter, Participant 14 has an average scaled stride length of 1.37, an average scaled stride height of 0.14 and an average stride duration of 0.92 seconds. Based on their height of 1.81 meters, their average stride length and stride height are 2.48 and 0.25, respectively.
Based on the median filter the interquarile range of the first difference of Participant 14’s scaled stride length, scaled stride height and stride duration are 0.002, 0, and 0, respectively.
The data processed with the median filtered approach is stored at the following location in our GitHub Repository.
Based on the median filter, Participant 15 has an average scaled stride length of 1.22, an average scaled stride height of 0.1 and an average stride duration of 1.03 seconds. Based on their height of 1.89 meters, their average stride length and stride height are 2.3 and 0.19, respectively.
Based on the median filter the interquarile range of the first difference of Participant 15’s scaled stride length, scaled stride height and stride duration are 0.002, 0, and 0, respectively.
The data processed with the median filtered approach is stored at the following location in our GitHub Repository.
# Saving a list of cleaned
save(subject1_medianf, subject2_medianf, subject3_medianf, subject4_medianf,
subject5_medianf, subject6_medianf, subject7_medianf, subject8_medianf,
subject9_medianf, subject10_medianf, subject11_medianf, subject12_medianf,
subject13_medianf, subject14_medianf, subject15_medianf,
subject1_median_diff_f, subject2_median_diff_f, subject3_median_diff_f,
subject4_median_diff_f,
subject5_median_diff_f, subject6_median_diff_f, subject7_median_diff_f,
subject8_median_diff_f,
subject9_median_diff_f, subject10_median_diff_f, subject11_median_diff_f,
subject12_median_diff_f,
subject13_median_diff_f, subject14_median_diff_f, subject15_median_diff_f,
file = "./Data/RGenerated/MedianFilteredData.RData")
load(file = "./Data/RGenerated/MedianFilteredData.RData")
for (i in 1:15) {
df <- get(paste0("subject",i,"_medianf"))
means <- colMeans(df)
df[,2] <- cumsum(df[,2]-means[2])
df[,3] <- cumsum(df[,3]-means[3])
df[,4] <- cumsum(df[,4]-means[4])
assign(paste0("subject",i,"_cusum"), df)
cat('###',paste0("P", i), "{-}",'\n')
df_transformed <- melt(get(paste0("subject",i,"_cusum")),
id.vars = "time.from.start",
measure.vars=c("scaled.stride.len",
"scaled.stride.height",
"stride.duration"
)
) # ggplot data needs to be tall
assign(paste0("g",i),
ggplot(data = df_transformed,
aes(x=time.from.start, y=value, group=variable,
color=variable,shape=variable)) +
geom_line() + theme_bw() +
ggtitle("CUSUM of the Median Filtered Data") +
theme(legend.position="none",
#axis.text.x=element_text(angle=90,hjust=1),
plot.title = element_text(hjust = 0.5)) +
facet_wrap(~variable,nrow=3, scales = "free_y")
)
print(get(paste0("g",i)))
cat('\n')
cat('<source> <p> The results from the analysis above can be found at: <a href="https://github.com/fmegahed/fatigue-changepoint/blob/master/Data/RGenerated/CusumData.RData">CusumData.RData</a>. </p> </source>')
cat('\n \n')
}
The results from the analysis above can be found at: CusumData.RData.
save(subject1_cusum, subject2_cusum, subject3_cusum,
subject4_cusum, subject5_cusum, subject6_cusum,
subject7_cusum, subject8_cusum, subject9_cusum,
subject10_cusum, subject11_cusum, subject12_cusum,
subject13_cusum, subject14_cusum, subject15_cusum,
file = "./Data/RGenerated/CusumData.RData")
A common approach for changepoint detection is to project the multivariate problem into a univariate-space using principal component analysis. Then, apply a univariate changepoint detection approach on the problem. We examine this approach in this section.
When we tested this, we found the following:
(A) The e.divise() produces a lot of changepoints with the deafult sig.level;
(B) Number of changepoints is still large with sig.lvl = 0.001;
For this reason, we do not think that this approach is promising. We will move on to the multivariate approach.
RatingThresh <- 13 # based on Maman et al. 2017
RPE = read.csv("https://raw.githubusercontent.com/fmegahed/fatigue-changepoint/master/Data/Raw/RPE.csv") # to read subjective rating of RPE data from GitHub'
colnames(RPE)[1] <- "time.from.start"
df_RPE_transformed <- melt(RPE,
id.vars = "time.from.start",
measure.vars = colnames(RPE)[2:16])
df_RPE_transformed[,2] <- paste0(df_RPE_transformed[,2],".RPE")
# to load previously generated data and changepoints
load(file = "./Data/RGenerated/MedianFilteredData.RData")
pca_subjects <- list()
pca_subjects_loadings <- list()
pc_prop_var_explained <- list()
changepoints_pca_ecp <- list()
changepoints_subj_thr <- list()
changepoints_subj_ecp <- list()
changepoints_cusum_subj_ecp <- list()
for (i in 1:15) {
cumsum_RPE <- data.frame(matrix(ncol=3, nrow=19)) # need to initialize every time
colnames(cumsum_RPE) <- c("time.from.start","variable","value")
correct_subject_rows <- which(df_RPE_transformed[,2]==paste0("Subject.",i,".RPE"))
dataholder <- df_RPE_transformed[correct_subject_rows,]
# Computing the CUSUMs of the RPEs
cumsum_RPE[,1] <- dataholder[,1]
cumsum_RPE[,2] <- paste0(dataholder[,2],".CUSUM")
cumsum_RPE[,3] <- cumsum(dataholder[,3]-13)
# -------- Calculation of Fatigue based on RPE>= 13 -----------------
fatigue_from_RPE_threshold <- which(RPE[,i+1]>=RatingThresh) %>%
min() # returns window number
changepoints_subj_thr[[i]] <- RPE[fatigue_from_RPE_threshold,1]
df_RPE<-data.frame(d1=RPE[1],t1=RPE[i+1]) # to make dataframe of subjective ratings for ggplot
#------- Estimation of Fatigue Changepoints based on ECP of the RPE Data --
y <- e.divisive(df_RPE,min.size = 2)$estimates
fatigue_from_ecp_changepoint = y[2:(length(y)-1)] # removing the first and last points as trivials changepoints
changepoints_subj_ecp[[i]] <- df_RPE[fatigue_from_ecp_changepoint,1]
# ------ Repeating the above analysis for the CUSUMs of the RPE --------
y_cumsum <- e.divisive(cumsum_RPE[,c(1,3)],min.size = 2)$estimates
fatigue_ecp_CusumRPE <- y_cumsum[2:length(y_cumsum)-1]
changepoints_cusum_subj_ecp[[i]] <- cumsum_RPE[fatigue_ecp_CusumRPE, 1]
# ------Computing the First Principal Component and the loadings for First PC ---
df <- paste0("subject",i,"_medianf") %>% get()
pca_dataholder <- prcomp(df[,2:4], center=TRUE, scale. = T) # Scale <- Standardize Vars
pca_subjects_loadings[[i]] <- pca_dataholder$rotation[,1] # Weights for First PC
# Understanding the % Variation Explained by the First PC
pc_std <- pca_dataholder$sdev
pc_var <- pc_std^2
pc_prop_var_explained[[i]] <- round(pc_var[1]/sum(pc_var),3)
# Storing the Data for First PC rotated values
pca_subjects[[i]] <- data.frame(time.from.start= df[,1],
variable= "PC1.Values",
value =pca_dataholder$x[,1]) # First PC
df_transformed <- pca_subjects[[i]]
df_transformed <- rbind(df_transformed,df_RPE_transformed[correct_subject_rows,],
cumsum_RPE)
# The Changepoints for the PCA Data based on e.divisive
pca_df <- pca_subjects[[i]]
z <- e.divisive(pca_df[,c(1,3)],min.size = 120, sig.lvl = 0.001)$estimates
changepoints_pca_ecp[[i]] <- pca_df[z[2:(length(z)-1)],]
cat('###',paste0("P", i), "{-}",'\n')
assign(paste0("g",i), biplot(pca_dataholder,
scale = 0))
print(get(paste0("g",i)))
cat("\n")
pca_vector <- pca_df[,c(1)]
assign(paste0("h",i),
ggplot(data = df_transformed,
aes(x=time.from.start, y=value, colour = variable)) +
geom_line(size=1) +
theme_bw() +
scale_colour_manual(values = c("maroon","black","grey40")) +
facet_wrap(~variable,nrow=3, scales = "free_y") +
# labs(title = paste("A paneled time-series plot for Participant",i)) +
geom_vline(xintercept= pca_vector,
show.legend = T, size=0.5) +
geom_vline(xintercept= changepoints_subj_thr[[i]],
linetype="solid", color="steelblue1",
show.legend = T, size=2) +
geom_vline(xintercept= changepoints_subj_ecp[[i]],
linetype="dotted", color="gray80",
show.legend =T, size=2) +
geom_vline(xintercept= changepoints_cusum_subj_ecp[[i]],
linetype="dashed", color="orange1",
show.legend =T, size=3) +
theme(legend.position="none",
plot.title = element_text(hjust = 0.5))
)
print(get(paste0("h",i)))
cat('\n \n')
cat(paste0('<source> <p> In the above figure, the colored vertical lines correspond to the following. First, the <b> black lines are used to denote the changepoints based on the non-parameteric, univariate, ECP approach based on the e.divisive(min.size=2) on the First Principal Component of the Kinematic Features. </b> Second, the <font color= #63B8FF> <b>steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. </b> </font> Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the <b> <font color=#CCCCCC> gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. </font> </b>',
'. Fourth, the <b> <font color=#FFA500> orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values. </font> </b> \n \n'))
cat(paste0('<source> <p> Based on the <b>ecp package, the number of changepoints for the First PC of the median filtered data for participant ', i,
' is equal to: ',
length(changepoints_pca_ecp[[i]]),
'. These changepoints are located at: ',
paste(changepoints_pca_ecp[[i]], collapse = ", "),
'. <font color= #63B8FF> The threshold based changepoint in the subjective RPE is located at: ',
paste(changepoints_subj_thr[[i]]),
'. </font> </b> In addition, <b> <font color=#CCCCCC> the number of changepoints based on the RPE values for this subject is equal to: ',
length(changepoints_subj_ecp[[i]]),
". These changepoints are located at: ",
paste(changepoints_subj_ecp[[i]], collapse = ", "),
". </font> </b> Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. <b> <font color=#FFA500> The number of these changepoints is equal to: ",
length(changepoints_cusum_subj_ecp[[i]]),
". These changepoints are located at: ",
paste(changepoints_cusum_subj_ecp[[i]], collapse = ", "),
'. </b> </font> The results from the analysis above can be found at: <a href="https://github.com/fmegahed/fatigue-changepoint/blob/master/Data/RGenerated/PCAChangepointsRPE.RData">PCAChangepointsRPE.RData</a>. </p> </source>')
)
cat('\n \n')
}
NULL
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, univariate, ECP approach based on the e.divisive(min.size=2) on the First Principal Component of the Kinematic Features. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.
Based on the ecp package, the number of changepoints for the First PC of the median filtered data for participant 1 is equal to: 3. These changepoints are located at: c(NA, 1.69921875), c(NA, 1), c(NA, -1.03940977461133). The threshold based changepoint in the subjective RPE is located at: 6600. In addition, the number of changepoints based on the RPE values for this subject is equal to: 4. These changepoints are located at: 1800, 4200, 6600, 9000. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 4. These changepoints are located at: 0, 3600, 6000, 8400. The results from the analysis above can be found at: PCAChangepointsRPE.RData.
NULL
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, univariate, ECP approach based on the e.divisive(min.size=2) on the First Principal Component of the Kinematic Features. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.
Based on the ecp package, the number of changepoints for the First PC of the median filtered data for participant 2 is equal to: 3. These changepoints are located at: c(NA, 5.05859375), c(NA, 1), c(NA, 1.22030006940124). The threshold based changepoint in the subjective RPE is located at: 6000. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 3600, 6000, 7800. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 4. These changepoints are located at: 0, 3600, 6000, 8400. The results from the analysis above can be found at: PCAChangepointsRPE.RData.
NULL
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, univariate, ECP approach based on the e.divisive(min.size=2) on the First Principal Component of the Kinematic Features. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.
Based on the ecp package, the number of changepoints for the First PC of the median filtered data for participant 3 is equal to: 3. These changepoints are located at: c(NA, 1.81640625), c(NA, 1), c(NA, -1.17124953688842). The threshold based changepoint in the subjective RPE is located at: 4200. In addition, the number of changepoints based on the RPE values for this subject is equal to: 2. These changepoints are located at: 3600, 7200. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 4. These changepoints are located at: 0, 3000, 5400, 7800. The results from the analysis above can be found at: PCAChangepointsRPE.RData.
NULL
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, univariate, ECP approach based on the e.divisive(min.size=2) on the First Principal Component of the Kinematic Features. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.
Based on the ecp package, the number of changepoints for the First PC of the median filtered data for participant 4 is equal to: 3. These changepoints are located at: c(NA, 1.494140625), c(NA, 1), c(NA, -2.64623541509993). The threshold based changepoint in the subjective RPE is located at: 3000. In addition, the number of changepoints based on the RPE values for this subject is equal to: 2. These changepoints are located at: 3600, 7200. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 3. These changepoints are located at: 0, 3600, 7200. The results from the analysis above can be found at: PCAChangepointsRPE.RData.
NULL
## Warning: Removed 10 rows containing missing values (geom_path).
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, univariate, ECP approach based on the e.divisive(min.size=2) on the First Principal Component of the Kinematic Features. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.
Based on the ecp package, the number of changepoints for the First PC of the median filtered data for participant 5 is equal to: 3. These changepoints are located at: c(NA, 10.52734375), c(NA, 1), c(NA, -0.488757398878383). The threshold based changepoint in the subjective RPE is located at: 2400. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 2400, 4800, 7800. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 4. These changepoints are located at: 0, 2400, 4800, 7800. The results from the analysis above can be found at: PCAChangepointsRPE.RData.
NULL
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, univariate, ECP approach based on the e.divisive(min.size=2) on the First Principal Component of the Kinematic Features. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.
Based on the ecp package, the number of changepoints for the First PC of the median filtered data for participant 6 is equal to: 3. These changepoints are located at: c(NA, 1.6796875), c(NA, 1), c(NA, -2.3085834822545). The threshold based changepoint in the subjective RPE is located at: 8400. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 2400, 4800, 7800. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 4. These changepoints are located at: 0, 3600, 6600, 9000. The results from the analysis above can be found at: PCAChangepointsRPE.RData.
NULL
## Warning: Removed 14 rows containing missing values (geom_path).
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, univariate, ECP approach based on the e.divisive(min.size=2) on the First Principal Component of the Kinematic Features. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.
Based on the ecp package, the number of changepoints for the First PC of the median filtered data for participant 7 is equal to: 3. These changepoints are located at: c(NA, 2.2265625), c(NA, 1), c(NA, 0.515308023840427). The threshold based changepoint in the subjective RPE is located at: 1800. In addition, the number of changepoints based on the RPE values for this subject is equal to: 2. These changepoints are located at: 3600, 7200. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 3. These changepoints are located at: 0, 3600, 7200. The results from the analysis above can be found at: PCAChangepointsRPE.RData.
NULL
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, univariate, ECP approach based on the e.divisive(min.size=2) on the First Principal Component of the Kinematic Features. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.
Based on the ecp package, the number of changepoints for the First PC of the median filtered data for participant 8 is equal to: 3. These changepoints are located at: c(NA, 2.1875), c(NA, 1), c(NA, -0.264098970906993). The threshold based changepoint in the subjective RPE is located at: 6600. In addition, the number of changepoints based on the RPE values for this subject is equal to: 1. These changepoints are located at: 4800. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 4. These changepoints are located at: 0, 3600, 6000, 8400. The results from the analysis above can be found at: PCAChangepointsRPE.RData.
NULL
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, univariate, ECP approach based on the e.divisive(min.size=2) on the First Principal Component of the Kinematic Features. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.
Based on the ecp package, the number of changepoints for the First PC of the median filtered data for participant 9 is equal to: 3. These changepoints are located at: c(NA, 1.982421875), c(NA, 1), c(NA, -2.93687639699979). The threshold based changepoint in the subjective RPE is located at: 3000. In addition, the number of changepoints based on the RPE values for this subject is equal to: 2. These changepoints are located at: 4200, 7800. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 4. These changepoints are located at: 0, 2400, 4800, 7800. The results from the analysis above can be found at: PCAChangepointsRPE.RData.
NULL
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, univariate, ECP approach based on the e.divisive(min.size=2) on the First Principal Component of the Kinematic Features. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.
Based on the ecp package, the number of changepoints for the First PC of the median filtered data for participant 10 is equal to: 3. These changepoints are located at: c(NA, 15.732421875), c(NA, 1), c(NA, 0.147654231594101). The threshold based changepoint in the subjective RPE is located at: 2400. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 2400, 4800, 7800. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 4. These changepoints are located at: 0, 2400, 4800, 7800. The results from the analysis above can be found at: PCAChangepointsRPE.RData.
NULL
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, univariate, ECP approach based on the e.divisive(min.size=2) on the First Principal Component of the Kinematic Features. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.
Based on the ecp package, the number of changepoints for the First PC of the median filtered data for participant 11 is equal to: 3. These changepoints are located at: c(NA, 1.97265625), c(NA, 1), c(NA, -4.48283612466273). The threshold based changepoint in the subjective RPE is located at: 10800. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 2400, 6000, 7800. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 4. These changepoints are located at: 0, 3600, 6600, 9000. The results from the analysis above can be found at: PCAChangepointsRPE.RData.
NULL
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, univariate, ECP approach based on the e.divisive(min.size=2) on the First Principal Component of the Kinematic Features. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.
Based on the ecp package, the number of changepoints for the First PC of the median filtered data for participant 12 is equal to: 3. These changepoints are located at: c(NA, 30.205078125), c(NA, 1), c(NA, -0.367814232820913). The threshold based changepoint in the subjective RPE is located at: 9000. In addition, the number of changepoints based on the RPE values for this subject is equal to: 2. These changepoints are located at: 4200, 7800. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 4. These changepoints are located at: 0, 3600, 6600, 9000. The results from the analysis above can be found at: PCAChangepointsRPE.RData.
NULL
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, univariate, ECP approach based on the e.divisive(min.size=2) on the First Principal Component of the Kinematic Features. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.
Based on the ecp package, the number of changepoints for the First PC of the median filtered data for participant 13 is equal to: 3. These changepoints are located at: c(NA, 2.3828125), c(NA, 1), c(NA, -2.57352735036413). The threshold based changepoint in the subjective RPE is located at: 6600. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 3600, 6000, 8400. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 3. These changepoints are located at: 0, 3600, 7200. The results from the analysis above can be found at: PCAChangepointsRPE.RData.
NULL
## Warning: Removed 6 rows containing missing values (geom_path).
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, univariate, ECP approach based on the e.divisive(min.size=2) on the First Principal Component of the Kinematic Features. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.
Based on the ecp package, the number of changepoints for the First PC of the median filtered data for participant 14 is equal to: 3. These changepoints are located at: c(NA, 2.2265625), c(NA, 1), c(NA, -1.92729475107289). The threshold based changepoint in the subjective RPE is located at: 3600. In addition, the number of changepoints based on the RPE values for this subject is equal to: 2. These changepoints are located at: 3600, 7800. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 4. These changepoints are located at: 0, 3000, 5400, 7800. The results from the analysis above can be found at: PCAChangepointsRPE.RData.
NULL
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, univariate, ECP approach based on the e.divisive(min.size=2) on the First Principal Component of the Kinematic Features. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.
Based on the ecp package, the number of changepoints for the First PC of the median filtered data for participant 15 is equal to: 3. These changepoints are located at: c(NA, 2.08984375), c(NA, 1), c(NA, -3.71564956530791). The threshold based changepoint in the subjective RPE is located at: 3600. In addition, the number of changepoints based on the RPE values for this subject is equal to: 2. These changepoints are located at: 3600, 7200. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 4. These changepoints are located at: 0, 3000, 5400, 7800. The results from the analysis above can be found at: PCAChangepointsRPE.RData.
save(pca_subjects, pca_subjects_loadings, pc_prop_var_explained,
changepoints_pca_ecp,
changepoints_subj_thr, changepoints_subj_ecp, changepoints_cusum_subj_ecp,
file = "./Data/RGenerated/PCAChangepointsRPE.RData.RData")
To be filled later.
In this section, we will examine the use of the approach of Matteson and James (2014) for multiple change point analysis of our trivariate data. Our analysis takes advantage of their R package titled ecp, which is described in more details in the following paper: James and Matteson (2014). The multivariate changepoint method will be first tested using the median filtered data and then using the CUSUM of the median filtered data.
To apply the ecp approach of Matteson and James (2014) on the median filtered data, we need to use a window size (which reflects the smallest size of a shift that is to be detected). From a biomechanics perspective, we expect that the change of performance due to fatigue will be sustained at least for several minutes (until the person adapts and/or rests). Thus, the size of the window (\(\small m_{ecp}\)) should be \(\small m_{ecp} \ge 60\). In our analysis, we tried the following window sizes: 60, 120, 180, 300, and 600. For most of the participants, the number and/or location of change were almost similar. For demonstration purpose, we use \(\small m_{ecp} = 120\) below.
load(file = "./Data/RGenerated/MedianFilteredData.RData")
pen <- function(x) -length(x) #Equally penalizes every additional changepoint
window.size <- 120
changepoints_medianf <- list()
for (i in 1:15) {
df <- get(paste0("subject",i,"_medianf"))
# Making the DF divisiable by the window size
# (i.e., we are removing the remainder so that our chunks are equally sized)
rows.to.keep <- nrow(df[,2:4]) %/% window.size
remainder <- nrow(df[,2:4]) %% window.size
df.MultivariateChangepoint<- df[1:(nrow(df)-remainder),
2:4] %>% as.matrix()
# Utilizing the e.agglo() from the ecp package
mem <- rep(1:rows.to.keep, each=window.size)
y = e.agglo(X=df.MultivariateChangepoint,
member= mem,
alpha = 1,
penalty = pen)
changepoints_medianf[[i]] <- y$estimates
cat('####',paste0("P", i), "{-}",'\n')
df_transformed <- melt(get(paste0("subject",i,"_medianf")),
id.vars = "time.from.start",
measure.vars=c("scaled.stride.len",
"scaled.stride.height",
"stride.duration"
)
) # ggplot data needs to be tall
assign(paste0("g",i),
ggplot(data = df_transformed,
aes(x=time.from.start, y=value, group=variable,
color=variable,shape=variable)) +
geom_line() + theme_bw() +
ggtitle("Changepoints for the Median Filtered Data (at vertical black lines)") +
theme(legend.position="none",
#axis.text.x=element_text(angle=90,hjust=1),
plot.title = element_text(hjust = 0.5)) +
facet_wrap(~variable,nrow=3, scales = "free_y") +
geom_vline(xintercept= y$estimates)
)
print(get(paste0("g",i)))
cat('\n')
cat(paste0('<source> <p> Based on the <b>ecp package</b>, the number of changepoints for participant ', i,
' is equal to: ',
length(changepoints_medianf[[i]]),
'. These changepoints are located at: ',
paste(changepoints_medianf[[i]], collapse = ", "),
'. The results from the analysis above can be found at: <a href="https://github.com/fmegahed/fatigue-changepoint/blob/master/Data/RGenerated/ECPChangePointsMFData.RData">ECPChangePointsMFData.RData</a>. </p> </source>')
)
cat('\n \n')
}
Based on the ecp package, the number of changepoints for participant 1 is equal to: 3. These changepoints are located at: 841, 3001, 4441. The results from the analysis above can be found at: ECPChangePointsMFData.RData.
Based on the ecp package, the number of changepoints for participant 2 is equal to: 2. These changepoints are located at: 241, 3481. The results from the analysis above can be found at: ECPChangePointsMFData.RData.
Based on the ecp package, the number of changepoints for participant 3 is equal to: 2. These changepoints are located at: 1081, 1321. The results from the analysis above can be found at: ECPChangePointsMFData.RData.
Based on the ecp package, the number of changepoints for participant 4 is equal to: 4. These changepoints are located at: 1, 1321, 5161, 7681. The results from the analysis above can be found at: ECPChangePointsMFData.RData.
Based on the ecp package, the number of changepoints for participant 5 is equal to: 2. These changepoints are located at: 1801, 2281. The results from the analysis above can be found at: ECPChangePointsMFData.RData.
Based on the ecp package, the number of changepoints for participant 6 is equal to: 4. These changepoints are located at: 1561, 2521, 3241, 5401. The results from the analysis above can be found at: ECPChangePointsMFData.RData.
Based on the ecp package, the number of changepoints for participant 7 is equal to: 3. These changepoints are located at: 1, 3241, 3481. The results from the analysis above can be found at: ECPChangePointsMFData.RData.
Based on the ecp package, the number of changepoints for participant 8 is equal to: 2. These changepoints are located at: 1081, 4201. The results from the analysis above can be found at: ECPChangePointsMFData.RData.
Based on the ecp package, the number of changepoints for participant 9 is equal to: 3. These changepoints are located at: 1, 5041, 5641. The results from the analysis above can be found at: ECPChangePointsMFData.RData.
Based on the ecp package, the number of changepoints for participant 10 is equal to: 3. These changepoints are located at: 1, 3241, 6601. The results from the analysis above can be found at: ECPChangePointsMFData.RData.
Based on the ecp package, the number of changepoints for participant 11 is equal to: 3. These changepoints are located at: 1, 1561, 7441. The results from the analysis above can be found at: ECPChangePointsMFData.RData.
Based on the ecp package, the number of changepoints for participant 12 is equal to: 5. These changepoints are located at: 1, 2161, 2401, 6121, 6241. The results from the analysis above can be found at: ECPChangePointsMFData.RData.
Based on the ecp package, the number of changepoints for participant 13 is equal to: 3. These changepoints are located at: 1, 841, 6361. The results from the analysis above can be found at: ECPChangePointsMFData.RData.
Based on the ecp package, the number of changepoints for participant 14 is equal to: 3. These changepoints are located at: 1801, 2041, 4801. The results from the analysis above can be found at: ECPChangePointsMFData.RData.
Based on the ecp package, the number of changepoints for participant 15 is equal to: 4. These changepoints are located at: 1, 4081, 6361, 6961. The results from the analysis above can be found at: ECPChangePointsMFData.RData.
save(changepoints_medianf,
file = "./Data/RGenerated/ECPChangePointsMFData.RData")
Here, we apply the ecp approach of Matteson and James (2014) on the median filtered differenced data (as opposed to the median filtered timeseries). For demonstration purpose, we use \(\small m_{ecp} = 120\) below. ** Note that, here, we set the value of \(\small \alpha=2\) as opposed to 1 (elsewhere) since we are look for changes in the variation instead of the mean See their paper for more details.
load(file = "./Data/RGenerated/MedianFilteredData.RData")
pen <- function(x) -length(x) #Equally penalizes every additional changepoint
window.size <- 120
changepoints_medianf_diff <- list()
for (i in 1:15) {
df <- get(paste0("subject",i,"_median_diff_f"))
# Making the DF divisiable by the window size
# (i.e., we are removing the remainder so that our chunks are equally sized)
rows.to.keep <- nrow(df[,2:4]) %/% window.size
remainder <- nrow(df[,2:4]) %% window.size
df.MultivariateChangepoint<- df[1:(nrow(df)-remainder),
2:4] %>% as.matrix()
# Utilizing the e.agglo() from the ecp package
mem <- rep(1:rows.to.keep, each=window.size)
y = e.agglo(X=df.MultivariateChangepoint,
member= mem,
alpha = 2,
penalty = pen)
changepoints_medianf_diff[[i]] <- y$estimates
cat('####',paste0("P", i), "{-}",'\n')
df_transformed <- melt(get(paste0("subject",i,"_median_diff_f")),
id.vars = "diff.time.from.start",
measure.vars=c("diff.scaled.stride.len",
"diff.scaled.stride.height",
"diff.stride.duration"
)
) # ggplot data needs to be tall
assign(paste0("g",i),
ggplot(data = df_transformed,
aes(x=diff.time.from.start, y=value, group=variable,
color=variable,shape=variable)) +
geom_line() + theme_bw() +
ggtitle("Changepoints for the Median Filtered Differenced Data (at vertical black lines)") +
theme(legend.position="none",
#axis.text.x=element_text(angle=90,hjust=1),
plot.title = element_text(hjust = 0.5)) +
facet_wrap(~variable,nrow=3, scales = "free_y") +
geom_vline(xintercept= y$estimates)
)
print(get(paste0("g",i)))
cat('\n')
cat(paste0('<source> <p> Based on the <b>ecp package</b>, the number of changepoints for the median filtered differenced data for participant ', i,
' is equal to: ',
length(changepoints_medianf_diff[[i]]),
'. These changepoints are located at: ',
paste(changepoints_medianf_diff[[i]], collapse = ", "),
'. The results from the analysis above can be found at: <a href="https://github.com/fmegahed/fatigue-changepoint/blob/master/Data/RGenerated/ECPChangePointsDiffMFData.RData">ECPChangePointsDiffMFData.RData</a>. </p> </source>')
)
cat('\n \n')
}
Based on the ecp package, the number of changepoints for the median filtered differenced data for participant 1 is equal to: 1. These changepoints are located at: 1441. The results from the analysis above can be found at: ECPChangePointsDiffMFData.RData.
Based on the ecp package, the number of changepoints for the median filtered differenced data for participant 2 is equal to: 1. These changepoints are located at: 4921. The results from the analysis above can be found at: ECPChangePointsDiffMFData.RData.
Based on the ecp package, the number of changepoints for the median filtered differenced data for participant 3 is equal to: 1. These changepoints are located at: 1441. The results from the analysis above can be found at: ECPChangePointsDiffMFData.RData.
Based on the ecp package, the number of changepoints for the median filtered differenced data for participant 4 is equal to: 1. These changepoints are located at: 3241. The results from the analysis above can be found at: ECPChangePointsDiffMFData.RData.
Based on the ecp package, the number of changepoints for the median filtered differenced data for participant 5 is equal to: 1. These changepoints are located at: 1921. The results from the analysis above can be found at: ECPChangePointsDiffMFData.RData.
Based on the ecp package, the number of changepoints for the median filtered differenced data for participant 6 is equal to: 1. These changepoints are located at: 5521. The results from the analysis above can be found at: ECPChangePointsDiffMFData.RData.
Based on the ecp package, the number of changepoints for the median filtered differenced data for participant 7 is equal to: 1. These changepoints are located at: 2641. The results from the analysis above can be found at: ECPChangePointsDiffMFData.RData.
Based on the ecp package, the number of changepoints for the median filtered differenced data for participant 8 is equal to: 1. These changepoints are located at: 6481. The results from the analysis above can be found at: ECPChangePointsDiffMFData.RData.
Based on the ecp package, the number of changepoints for the median filtered differenced data for participant 9 is equal to: 1. These changepoints are located at: 2521. The results from the analysis above can be found at: ECPChangePointsDiffMFData.RData.
Based on the ecp package, the number of changepoints for the median filtered differenced data for participant 10 is equal to: 1. These changepoints are located at: 3241. The results from the analysis above can be found at: ECPChangePointsDiffMFData.RData.
Based on the ecp package, the number of changepoints for the median filtered differenced data for participant 11 is equal to: 1. These changepoints are located at: 1321. The results from the analysis above can be found at: ECPChangePointsDiffMFData.RData.
Based on the ecp package, the number of changepoints for the median filtered differenced data for participant 12 is equal to: 1. These changepoints are located at: 2161. The results from the analysis above can be found at: ECPChangePointsDiffMFData.RData.
Based on the ecp package, the number of changepoints for the median filtered differenced data for participant 13 is equal to: 2. These changepoints are located at: 1, 6361. The results from the analysis above can be found at: ECPChangePointsDiffMFData.RData.
Based on the ecp package, the number of changepoints for the median filtered differenced data for participant 14 is equal to: 1. These changepoints are located at: 2041. The results from the analysis above can be found at: ECPChangePointsDiffMFData.RData.
Based on the ecp package, the number of changepoints for the median filtered differenced data for participant 15 is equal to: 1. These changepoints are located at: 4321. The results from the analysis above can be found at: ECPChangePointsDiffMFData.RData.
save(changepoints_medianf_diff,
file = "./Data/RGenerated/ECPChangePointsDiffMFData.RData")
Here, we apply the ecp approach of Matterson and James (2014) on the cusums of the median filtered data (as opposed to the median filtered timeseries). For demonstration purpose, we use \(\small m_{ecp} = 120\) below.
load(file = "./Data/RGenerated/CusumData.RData")
pen <- function(x) -length(x) #Equally penalizes every additional changepoint
window.size <- 120
changepoints_cusum <- list()
for (i in 1:15) {
df <- get(paste0("subject",i,"_cusum"))
# Making the DF divisiable by the window size
# (i.e., we are removing the remainder so that our chunks are equally sized)
rows.to.keep <- nrow(df[,2:4]) %/% window.size
remainder <- nrow(df[,2:4]) %% window.size
df.MultivariateChangepoint<- df[1:(nrow(df)-remainder),
2:4] %>% as.matrix()
# Utilizing the e.agglo() from the ecp package
mem <- rep(1:rows.to.keep, each=window.size)
y = e.agglo(X=df.MultivariateChangepoint,
member= mem,
alpha = 1,
penalty = pen)
changepoints_cusum[[i]] <- y$estimates
cat('####',paste0("P", i), "{-}",'\n')
df_transformed <- melt(get(paste0("subject",i,"_cusum")),
id.vars = "time.from.start",
measure.vars=c("scaled.stride.len",
"scaled.stride.height",
"stride.duration"
)
) # ggplot data needs to be tall
assign(paste0("g",i),
ggplot(data = df_transformed,
aes(x=time.from.start, y=value, group=variable,
color=variable,shape=variable)) +
geom_line() + theme_bw() +
ggtitle("Changepoints for the CUSUMs of the Median Filtered Data (vertical black lines)") +
theme(legend.position="none",
#axis.text.x=element_text(angle=90,hjust=1),
plot.title = element_text(hjust = 0.5)) +
facet_wrap(~variable,nrow=3, scales = "free_y") +
geom_vline(xintercept= y$estimates)
)
print(get(paste0("g",i)))
cat('\n')
cat(paste0('<source> <p> Based on the <b>ecp package</b>, the number of changepoints for the cusums of the median filtered data for participant ', i,
' is equal to: ',
length(changepoints_cusum[[i]]),
'. These changepoints are located at: ',
paste(changepoints_cusum[[i]], collapse = ", "),
'. The results from the analysis above can be found at: <a href="https://github.com/fmegahed/fatigue-changepoint/blob/master/Data/RGenerated/ECPChangePointsCUSUM.RData">ECPChangePointsCUSUM.RData</a>. </p> </source>')
)
cat('\n \n')
}
Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 1 is equal to: 3. These changepoints are located at: 1, 3481, 4801. The results from the analysis above can be found at: ECPChangePointsCUSUM.RData.
Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 2 is equal to: 2. These changepoints are located at: 2641, 4201. The results from the analysis above can be found at: ECPChangePointsCUSUM.RData.
Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 3 is equal to: 3. These changepoints are located at: 601, 1201, 1561. The results from the analysis above can be found at: ECPChangePointsCUSUM.RData.
Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 4 is equal to: 4. These changepoints are located at: 721, 3721, 5161, 7441. The results from the analysis above can be found at: ECPChangePointsCUSUM.RData.
Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 5 is equal to: 2. These changepoints are located at: 841, 1921. The results from the analysis above can be found at: ECPChangePointsCUSUM.RData.
Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 6 is equal to: 2. These changepoints are located at: 121, 4561. The results from the analysis above can be found at: ECPChangePointsCUSUM.RData.
Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 7 is equal to: 4. These changepoints are located at: 1, 721, 1801, 3481. The results from the analysis above can be found at: ECPChangePointsCUSUM.RData.
Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 8 is equal to: 5. These changepoints are located at: 1, 1561, 3481, 4561, 7441. The results from the analysis above can be found at: ECPChangePointsCUSUM.RData.
Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 9 is equal to: 2. These changepoints are located at: 3361, 5281. The results from the analysis above can be found at: ECPChangePointsCUSUM.RData.
Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 10 is equal to: 2. These changepoints are located at: 961, 5521. The results from the analysis above can be found at: ECPChangePointsCUSUM.RData.
Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 11 is equal to: 2. These changepoints are located at: 1081, 6361. The results from the analysis above can be found at: ECPChangePointsCUSUM.RData.
Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 12 is equal to: 4. These changepoints are located at: 2281, 3241, 5521, 6121. The results from the analysis above can be found at: ECPChangePointsCUSUM.RData.
Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 13 is equal to: 4. These changepoints are located at: 361, 1441, 3841, 5761. The results from the analysis above can be found at: ECPChangePointsCUSUM.RData.
Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 14 is equal to: 3. These changepoints are located at: 1081, 1921, 3721. The results from the analysis above can be found at: ECPChangePointsCUSUM.RData.
Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 15 is equal to: 2. These changepoints are located at: 1441, 5281. The results from the analysis above can be found at: ECPChangePointsCUSUM.RData.
save(changepoints_cusum,
file = "./Data/RGenerated/ECPChangePointsCUSUM.RData")
To compare the performance of the ECP approach, we also investigated using the approach of Cappizi and Masarotto (2016) for another nonparameteric multivariate changepoint analysis method. Similar to the ecp approach, we applied their methodology on the: (a) median filtered data; and (b) cusums of the median filtered data. The results are shown in the following subsections. Note that we did not perform the analysis on the differences since we concluded that it was less informative than (a) and (b) based on our conclusions from Section 4.1.
For the parameterization of the Cappizi and Masarotto (2016) approach, we used the default parameters of the mphase1() function from the dphase1 package with the exception of:
– isolated = FALSE since we do not expect shifts to last only 120 strides.
– plot = FALSE since we would like to use ggplot package instead of the default plot function.
Notes:
Similar to the analysis in Section 4.1, we are using the window size of 120 (\(\small m_{dphase1}=120\)). This allows us to compare the results (in terms of the number of detected changepoints and whether these match better from a graphical perspective).
Note that we have used the default \(\small lmin=5\), which means that the mphase1 method will ignore any changepoints within 5 windows of the previously detected changepoint. While this is different than the ECP method, we still have more changepoints (and thus, we have elected against changing this default setting).
load(file = "./Data/RGenerated/MedianFilteredData.RData")
changepoints_mph1_mf <- list()
for (i in 1:15) {
df <- get(paste0("subject",i,"_medianf"))
window.size <- 120
rows.to.keep <- nrow(df[,2:4]) %/% window.size
remainder <- nrow(df[,2:4]) %% window.size
df.MultivariateChangepoint<- df[1:(nrow(df)-remainder),
2:4]
dataholder.array <- t(df.MultivariateChangepoint) %>%
array(c(3, window.size, rows.to.keep))
y <- mphase1(dataholder.array,
isolated=FALSE, plot = FALSE)
mcp <- (df[unlist(y$alasso[2]),1]*window.size) %>% round(digits = 0)
mcp <- mcp[order(mcp)]
mcp <- (mcp - window.size) + 1 # to make the change point at the begining of the period and not end (to be similar to the ecp method)
changepoints_mph1_mf[[i]] <- y
cat('####',paste0("P", i), "{-}",'\n')
df_transformed <- melt(get(paste0("subject",i,"_medianf")),
id.vars = "time.from.start",
measure.vars=c("scaled.stride.len",
"scaled.stride.height",
"stride.duration"
)
) # ggplot data needs to be tall
assign(paste0("g",i),
ggplot(data = df_transformed,
aes(x=time.from.start, y=value, group=variable,
color=variable,shape=variable)) +
geom_line() + theme_bw() +
ggtitle("Changepoints for the Median Filtered Data (w/ dfphase1 package)") +
theme(legend.position="none",
#axis.text.x=element_text(angle=90,hjust=1),
plot.title = element_text(hjust = 0.5)) +
facet_wrap(~variable,nrow=3, scales = "free_y") +
geom_vline(xintercept= mcp)
)
print(get(paste0("g",i)))
cat('\n')
cat(paste0('<source> <p> Based on the <b>dfphase1 package</b>, the number of changepoints for the cusums of the median filtered data for participant ', i,
' is equal to: ',
length(mcp),
'. These changepoints are located at: ',
paste(mcp, collapse = ", "),
'. The results from the analysis above can be found at: <a href="https://github.com/fmegahed/fatigue-changepoint/blob/master/Data/RGenerated/ChangePointsMedianFilterMphase1.RData">ChangePointsMedianFilterMphase1.RData</a>. </p> </source>')
)
cat('\n \n')
}
Based on the dfphase1 package, the number of changepoints for the cusums of the median filtered data for participant 1 is equal to: 4. These changepoints are located at: 1270, 8122, 9121, 10231. The results from the analysis above can be found at: ChangePointsMedianFilterMphase1.RData.
Based on the dfphase1 package, the number of changepoints for the cusums of the median filtered data for participant 2 is equal to: 5. These changepoints are located at: 1110, 2478, 3768, 4652, 5309. The results from the analysis above can be found at: ChangePointsMedianFilterMphase1.RData.
Based on the dfphase1 package, the number of changepoints for the cusums of the median filtered data for participant 3 is equal to: 1. These changepoints are located at: 1015. The results from the analysis above can be found at: ChangePointsMedianFilterMphase1.RData.
Based on the dfphase1 package, the number of changepoints for the cusums of the median filtered data for participant 4 is equal to: 8. These changepoints are located at: 773, 1513, 3988, 6087, 7567, 8201, 9036, 9774. The results from the analysis above can be found at: ChangePointsMedianFilterMphase1.RData.
Based on the dfphase1 package, the number of changepoints for the cusums of the median filtered data for participant 5 is equal to: 3. These changepoints are located at: 2345, 3122, 4152. The results from the analysis above can be found at: ChangePointsMedianFilterMphase1.RData.
Based on the dfphase1 package, the number of changepoints for the cusums of the median filtered data for participant 6 is equal to: 6. These changepoints are located at: 720, 1592, 2474, 3346, 4355, 5110. The results from the analysis above can be found at: ChangePointsMedianFilterMphase1.RData.
Based on the dfphase1 package, the number of changepoints for the cusums of the median filtered data for participant 7 is equal to: 3. These changepoints are located at: 999, 1972, 2934. The results from the analysis above can be found at: ChangePointsMedianFilterMphase1.RData.
Based on the dfphase1 package, the number of changepoints for the cusums of the median filtered data for participant 8 is equal to: 8. These changepoints are located at: 997, 1843, 3503, 5517, 6493, 7204, 8277, 8991. The results from the analysis above can be found at: ChangePointsMedianFilterMphase1.RData.
Based on the dfphase1 package, the number of changepoints for the cusums of the median filtered data for participant 9 is equal to: 5. These changepoints are located at: 774, 1951, 2882, 4274, 7502. The results from the analysis above can be found at: ChangePointsMedianFilterMphase1.RData.
Based on the dfphase1 package, the number of changepoints for the cusums of the median filtered data for participant 10 is equal to: 7. These changepoints are located at: 2615, 3485, 4326, 5174, 5990, 6808, 7760. The results from the analysis above can be found at: ChangePointsMedianFilterMphase1.RData.
Based on the dfphase1 package, the number of changepoints for the cusums of the median filtered data for participant 11 is equal to: 6. These changepoints are located at: 1217, 2562, 5458, 7088, 8582, 9319. The results from the analysis above can be found at: ChangePointsMedianFilterMphase1.RData.
Based on the dfphase1 package, the number of changepoints for the cusums of the median filtered data for participant 12 is equal to: 6. These changepoints are located at: 4668, 5336, 6286, 6952, 7934, 8628. The results from the analysis above can be found at: ChangePointsMedianFilterMphase1.RData.
Based on the dfphase1 package, the number of changepoints for the cusums of the median filtered data for participant 13 is equal to: 6. These changepoints are located at: 2371, 4082, 5069, 5896, 7150, 7983. The results from the analysis above can be found at: ChangePointsMedianFilterMphase1.RData.
Based on the dfphase1 package, the number of changepoints for the cusums of the median filtered data for participant 14 is equal to: 4. These changepoints are located at: 1134, 1914, 7931, 9336. The results from the analysis above can be found at: ChangePointsMedianFilterMphase1.RData.
Based on the dfphase1 package, the number of changepoints for the cusums of the median filtered data for participant 15 is equal to: 6. These changepoints are located at: 1351, 2071, 3543, 4415, 6939, 9092. The results from the analysis above can be found at: ChangePointsMedianFilterMphase1.RData.
save(changepoints_mph1_mf,
file = "./Data/RGenerated/ChangePointsMedianFilterMphase1.RData")
load(file = "./Data/RGenerated/CusumData.RData")
changepoints_mph1_cusum <- list()
for (i in 1:15) {
df <- get(paste0("subject",i,"_cusum"))
window.size <- 120
rows.to.keep <- nrow(df[,2:4]) %/% window.size
remainder <- nrow(df[,2:4]) %% window.size
df.MultivariateChangepoint<- df[1:(nrow(df)-remainder),
2:4]
dataholder.array <- t(df.MultivariateChangepoint) %>%
array(c(3, window.size, rows.to.keep))
y <- mphase1(dataholder.array,
isolated=FALSE, plot = FALSE)
mcp <- (df[unlist(y$alasso[2]),1]*window.size) %>% round(digits = 0)
mcp <- mcp[order(mcp)]
mcp <- (mcp - window.size) + 1 # to make the change point at the begining of the period and not end (to be similar to the ecp method)
changepoints_mph1_cusum[[i]] <- y
cat('####',paste0("P", i), "{-}",'\n')
df_transformed <- melt(get(paste0("subject",i,"_cusum")),
id.vars = "time.from.start",
measure.vars=c("scaled.stride.len",
"scaled.stride.height",
"stride.duration"
)
) # ggplot data needs to be tall
assign(paste0("g",i),
ggplot(data = df_transformed,
aes(x=time.from.start, y=value, group=variable,
color=variable,shape=variable)) +
geom_line() + theme_bw() +
ggtitle("Changepoints for the CUSUMs of the Median Filtered Data (w/ dfphase1 package)") +
theme(legend.position="none",
#axis.text.x=element_text(angle=90,hjust=1),
plot.title = element_text(hjust = 0.5)) +
facet_wrap(~variable,nrow=3, scales = "free_y") +
geom_vline(xintercept= mcp)
)
print(get(paste0("g",i)))
cat('\n')
cat(paste0('<source> <p> Based on the <b>dfphase1 package</b>, the number of changepoints for the cusums of the median filtered data for participant ', i,
' is equal to: ',
length(mcp),
'. These changepoints are located at: ',
paste(mcp, collapse = ", "),
'. The results from the analysis above can be found at: <a href="https://github.com/fmegahed/fatigue-changepoint/blob/master/Data/RGenerated/ChangePointsCusumMphase1.RData">ChangePointsCusumMphase1.RData</a>. </p> </source>')
)
cat('\n \n')
}
Based on the dfphase1 package, the number of changepoints for the cusums of the median filtered data for participant 1 is equal to: 5. These changepoints are located at: 1270, 7619, 8502, 9369, 10231. The results from the analysis above can be found at: ChangePointsCusumMphase1.RData.
Based on the dfphase1 package, the number of changepoints for the cusums of the median filtered data for participant 2 is equal to: 4. These changepoints are located at: 1110, 2048, 3548, 5091. The results from the analysis above can be found at: ChangePointsCusumMphase1.RData.
Based on the dfphase1 package, the number of changepoints for the cusums of the median filtered data for participant 3 is equal to: 1. These changepoints are located at: 1015. The results from the analysis above can be found at: ChangePointsCusumMphase1.RData.
Based on the dfphase1 package, the number of changepoints for the cusums of the median filtered data for participant 4 is equal to: 7. These changepoints are located at: 773, 3604, 5057, 7242, 8095, 9036, 9671. The results from the analysis above can be found at: ChangePointsCusumMphase1.RData.
Based on the dfphase1 package, the number of changepoints for the cusums of the median filtered data for participant 5 is equal to: 2. These changepoints are located at: 2474, 3504. The results from the analysis above can be found at: ChangePointsCusumMphase1.RData.
Based on the dfphase1 package, the number of changepoints for the cusums of the median filtered data for participant 6 is equal to: 5. These changepoints are located at: 1094, 2092, 3346, 4226, 5110. The results from the analysis above can be found at: ChangePointsCusumMphase1.RData.
Based on the dfphase1 package, the number of changepoints for the cusums of the median filtered data for participant 7 is equal to: 2. These changepoints are located at: 1693, 3204. The results from the analysis above can be found at: ChangePointsCusumMphase1.RData.
Based on the dfphase1 package, the number of changepoints for the cusums of the median filtered data for participant 8 is equal to: 7. These changepoints are located at: 753, 1588, 5130, 5894, 6848, 7561, 8636. The results from the analysis above can be found at: ChangePointsCusumMphase1.RData.
Based on the dfphase1 package, the number of changepoints for the cusums of the median filtered data for participant 9 is equal to: 5. These changepoints are located at: 1032, 1819, 3016, 4274, 7936. The results from the analysis above can be found at: ChangePointsCusumMphase1.RData.
Based on the dfphase1 package, the number of changepoints for the cusums of the median filtered data for participant 10 is equal to: 6. These changepoints are located at: 3237, 3973, 4687, 5874, 6693, 7524. The results from the analysis above can be found at: ChangePointsCusumMphase1.RData.
Based on the dfphase1 package, the number of changepoints for the cusums of the median filtered data for participant 11 is equal to: 6. These changepoints are located at: 1461, 2193, 4955, 6959, 7833, 8706. The results from the analysis above can be found at: ChangePointsCusumMphase1.RData.
Based on the dfphase1 package, the number of changepoints for the cusums of the median filtered data for participant 12 is equal to: 5. These changepoints are located at: 4327, 5116, 6399, 7707, 8740. The results from the analysis above can be found at: ChangePointsCusumMphase1.RData.
Based on the dfphase1 package, the number of changepoints for the cusums of the median filtered data for participant 13 is equal to: 6. These changepoints are located at: 1461, 3265, 4501, 6035, 7427, 8259. The results from the analysis above can be found at: ChangePointsCusumMphase1.RData.
Based on the dfphase1 package, the number of changepoints for the cusums of the median filtered data for participant 14 is equal to: 5. These changepoints are located at: 1246, 1914, 3549, 8648, 9336. The results from the analysis above can be found at: ChangePointsCusumMphase1.RData.
Based on the dfphase1 package, the number of changepoints for the cusums of the median filtered data for participant 15 is equal to: 7. These changepoints are located at: 984, 1951, 2806, 3666, 4675, 7203, 8961. The results from the analysis above can be found at: ChangePointsCusumMphase1.RData.
save(changepoints_mph1_cusum,
file = "./Data/RGenerated/ChangePointsCusumMphase1.RData")
Throughout the experiments, participants provided their subjective ratings every 10 minutes. The ratings were based on the Borg 6-20 RPE scale. Recent literature on fatigue detection suggests that a rating of 13 can be used to estimate the onset of fatigue (see Maman et al. (2017) for a detailed analysis). The subjective ratings for each participant can be accessed at: subjective-ratings-raw.
In this section, we perform the following tasks:
EDA, where we overlay the subjective ratings with the Median Filtered changepoints;
A threshold based comparison, where we use \(\small RPE \ge 13\) to identify when a participant becomes fatigued and we compare that to the changepoints detected by the hybrid CUSUM-EPC approach; and
Utilize the e.divisive() from the ecp package to obtain the changepoints for the Borg Ratings. Note that this function is the univariate approach for the multiple changepoint method examined in Section 4.1.
RatingThresh <- 13 # based on Maman et al. 2017
RPE = read.csv("https://raw.githubusercontent.com/fmegahed/fatigue-changepoint/master/Data/Raw/RPE.csv") # to read subjective rating of RPE data from GitHub'
colnames(RPE)[1] <- "time.from.start"
df_RPE_transformed <- melt(RPE,
id.vars = "time.from.start",
measure.vars = colnames(RPE)[2:16])
df_RPE_transformed[,2] <- paste0(df_RPE_transformed[,2],".RPE")
# to load previously generated data and changepoints
load(file = "./Data/RGenerated/MedianFilteredData.RData")
load(file = "./Data/RGenerated/ECPChangePointsMFData.RData")
changepoints_subj_ecp <- list()
changepoints_subj_thr <- list()
changepoints_cusum_subj_ecp <- list()
for (i in 1:15) {
cumsum_RPE <- data.frame(matrix(ncol=3, nrow=19)) # need to initialize every time
colnames(cumsum_RPE) <- c("time.from.start","variable","value")
correct_subject_rows <- which(df_RPE_transformed[,2]==paste0("Subject.",i,".RPE"))
dataholder <- df_RPE_transformed[correct_subject_rows,]
# Computing the CUSUMs of the RPEs
cumsum_RPE[,1] <- dataholder[,1]
cumsum_RPE[,2] <- paste0(dataholder[,2],".CUSUM")
cumsum_RPE[,3] <- cumsum(dataholder[,3]-13)
# -------- Calculation of Fatigue based on RPE>= 13 -----------------
fatigue_from_RPE_threshold <- which(RPE[,i+1]>=RatingThresh) %>%
min() # returns window number
changepoints_subj_thr[[i]] <- RPE[fatigue_from_RPE_threshold,1]
df_RPE<-data.frame(d1=RPE[1],t1=RPE[i+1]) # to make dataframe of subjective ratings for ggplot
#------- Estimation of Fatigue Changepoints based on ECP of the RPE Data --
y <- e.divisive(df_RPE,min.size = 2)$estimates
fatigue_from_ecp_changepoint = y[2:(length(y)-1)] # removing the first and last points as trivials changepoints
changepoints_subj_ecp[[i]] <- df_RPE[fatigue_from_ecp_changepoint,1]
# ------ Repeating the above analysis for the CUSUMs of the RPE --------
y_cumsum <- e.divisive(cumsum_RPE[,c(1,3)],min.size = 2)$estimates
fatigue_ecp_CusumRPE <- y_cumsum[2:length(y_cumsum)-1]
changepoints_cusum_subj_ecp[[i]] <- cumsum_RPE[fatigue_ecp_CusumRPE, 1]
cat('###',paste0("P", i), "{-}",'\n')
assign(paste0("g",i),
ggplot(data = df_transformed,
aes(x=time.from.start, y=value, colour = variable)) +
geom_line(size=1) +
theme_bw() +
scale_colour_manual(values = c("red", "green", "blue","black","grey40")) +
facet_wrap(~variable,nrow=5, scales = "free_y") +
# labs(title = paste("A paneled time-series plot for Participant",i)) +
geom_vline(xintercept= changepoints_medianf[[i]],
show.legend = T, size=2) +
geom_vline(xintercept= changepoints_subj_thr[[i]],
linetype="solid", color="steelblue1",
show.legend = T, size=2) +
geom_vline(xintercept= changepoints_subj_ecp[[i]],
linetype="dotted", color="gray80",
show.legend =T, size=2) +
geom_vline(xintercept= changepoints_cusum_subj_ecp[[i]],
linetype="dashed", color="orange1",
show.legend =T, size=3) +
theme(legend.position="none",
plot.title = element_text(hjust = 0.5))
)
print(get(paste0("g",i)))
cat('\n \n')
cat(paste0('<source> <p> In the above figure, the colored vertical lines correspond to the following. First, the <b> black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. </b> These changepoints are what we obtained in Section 4.1.3. Second, the <font color= #63B8FF> <b>steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. </b> </font> Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the <b> <font color=#CCCCCC> gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. </font> </b>',
'. Fourth, the <b> <font color=#FFA500> orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values. </font> </b> \n \n'))
cat(paste0('<source> <p> Based on the <b>ecp package, the number of changepoints for the the median filtered data for participant ', i,
' is equal to: ',
length(changepoints_medianf[[i]]),
'. These changepoints are located at: ',
paste(changepoints_medianf[[i]], collapse = ", "),
'. <font color= #63B8FF> The threshold based changepoint in the subjective RPE is located at: ',
paste(changepoints_subj_thr[[i]]),
'. </font> </b> In addition, <b> <font color=#CCCCCC> the number of changepoints based on the RPE values for this subject is equal to: ',
length(changepoints_subj_ecp[[i]]),
". These changepoints are located at: ",
paste(changepoints_subj_ecp[[i]], collapse = ", "),
". </font> </b> Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. <b> <font color=#FFA500> The number of these changepoints is equal to: ",
length(changepoints_cusum_subj_ecp[[i]]),
". These changepoints are located at: ",
paste(changepoints_cusum_subj_ecp[[i]], collapse = ", "),
'. </b> </font> The results from the analysis above can be found at: <a href="https://github.com/fmegahed/fatigue-changepoint/blob/master/Data/RGenerated/ChangepointsRPE.RData">ChangepointsRPE.RData</a>. </p> </source>')
)
cat('\n \n')
}
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 4.1.3. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.
Based on the ecp package, the number of changepoints for the the median filtered data for participant 1 is equal to: 3. These changepoints are located at: 841, 3001, 4441. The threshold based changepoint in the subjective RPE is located at: 6600. In addition, the number of changepoints based on the RPE values for this subject is equal to: 4. These changepoints are located at: 1800, 4200, 6600, 9000. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 4. These changepoints are located at: 0, 3600, 6000, 8400. The results from the analysis above can be found at: ChangepointsRPE.RData.
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 4.1.3. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.
Based on the ecp package, the number of changepoints for the the median filtered data for participant 2 is equal to: 2. These changepoints are located at: 241, 3481. The threshold based changepoint in the subjective RPE is located at: 6000. In addition, the number of changepoints based on the RPE values for this subject is equal to: 1. These changepoints are located at: 6000. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 4. These changepoints are located at: 0, 3600, 6000, 8400. The results from the analysis above can be found at: ChangepointsRPE.RData.
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 4.1.3. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.
Based on the ecp package, the number of changepoints for the the median filtered data for participant 3 is equal to: 2. These changepoints are located at: 1081, 1321. The threshold based changepoint in the subjective RPE is located at: 4200. In addition, the number of changepoints based on the RPE values for this subject is equal to: 2. These changepoints are located at: 3600, 7200. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 4. These changepoints are located at: 0, 3000, 5400, 7800. The results from the analysis above can be found at: ChangepointsRPE.RData.
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 4.1.3. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.
Based on the ecp package, the number of changepoints for the the median filtered data for participant 4 is equal to: 4. These changepoints are located at: 1, 1321, 5161, 7681. The threshold based changepoint in the subjective RPE is located at: 3000. In addition, the number of changepoints based on the RPE values for this subject is equal to: 2. These changepoints are located at: 3600, 7200. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 3. These changepoints are located at: 0, 3600, 7200. The results from the analysis above can be found at: ChangepointsRPE.RData.
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 4.1.3. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.
Based on the ecp package, the number of changepoints for the the median filtered data for participant 5 is equal to: 2. These changepoints are located at: 1801, 2281. The threshold based changepoint in the subjective RPE is located at: 2400. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 2400, 4800, 7800. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 4. These changepoints are located at: 0, 2400, 4800, 7800. The results from the analysis above can be found at: ChangepointsRPE.RData.
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 4.1.3. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.
Based on the ecp package, the number of changepoints for the the median filtered data for participant 6 is equal to: 4. These changepoints are located at: 1561, 2521, 3241, 5401. The threshold based changepoint in the subjective RPE is located at: 8400. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 2400, 4800, 7800. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 4. These changepoints are located at: 0, 3600, 6600, 9000. The results from the analysis above can be found at: ChangepointsRPE.RData.
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 4.1.3. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.
Based on the ecp package, the number of changepoints for the the median filtered data for participant 7 is equal to: 3. These changepoints are located at: 1, 3241, 3481. The threshold based changepoint in the subjective RPE is located at: 1800. In addition, the number of changepoints based on the RPE values for this subject is equal to: 2. These changepoints are located at: 3600, 7200. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 3. These changepoints are located at: 0, 3600, 7200. The results from the analysis above can be found at: ChangepointsRPE.RData.
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 4.1.3. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.
Based on the ecp package, the number of changepoints for the the median filtered data for participant 8 is equal to: 2. These changepoints are located at: 1081, 4201. The threshold based changepoint in the subjective RPE is located at: 6600. In addition, the number of changepoints based on the RPE values for this subject is equal to: 4. These changepoints are located at: 2400, 4800, 6600, 9000. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 4. These changepoints are located at: 0, 3600, 6000, 8400. The results from the analysis above can be found at: ChangepointsRPE.RData.
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 4.1.3. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.
Based on the ecp package, the number of changepoints for the the median filtered data for participant 9 is equal to: 3. These changepoints are located at: 1, 5041, 5641. The threshold based changepoint in the subjective RPE is located at: 3000. In addition, the number of changepoints based on the RPE values for this subject is equal to: 2. These changepoints are located at: 4200, 7800. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 4. These changepoints are located at: 0, 2400, 4800, 7800. The results from the analysis above can be found at: ChangepointsRPE.RData.
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 4.1.3. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.
Based on the ecp package, the number of changepoints for the the median filtered data for participant 10 is equal to: 3. These changepoints are located at: 1, 3241, 6601. The threshold based changepoint in the subjective RPE is located at: 2400. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 2400, 4800, 7800. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 4. These changepoints are located at: 0, 2400, 4800, 7800. The results from the analysis above can be found at: ChangepointsRPE.RData.
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 4.1.3. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.
Based on the ecp package, the number of changepoints for the the median filtered data for participant 11 is equal to: 3. These changepoints are located at: 1, 1561, 7441. The threshold based changepoint in the subjective RPE is located at: 10800. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 2400, 6000, 7800. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 3. These changepoints are located at: 0, 3600, 6600. The results from the analysis above can be found at: ChangepointsRPE.RData.
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 4.1.3. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.
Based on the ecp package, the number of changepoints for the the median filtered data for participant 12 is equal to: 5. These changepoints are located at: 1, 2161, 2401, 6121, 6241. The threshold based changepoint in the subjective RPE is located at: 9000. In addition, the number of changepoints based on the RPE values for this subject is equal to: 2. These changepoints are located at: 4200, 7800. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 4. These changepoints are located at: 0, 3600, 6600, 9000. The results from the analysis above can be found at: ChangepointsRPE.RData.
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 4.1.3. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.
Based on the ecp package, the number of changepoints for the the median filtered data for participant 13 is equal to: 3. These changepoints are located at: 1, 841, 6361. The threshold based changepoint in the subjective RPE is located at: 6600. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 3600, 6000, 8400. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 3. These changepoints are located at: 0, 3600, 7200. The results from the analysis above can be found at: ChangepointsRPE.RData.
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 4.1.3. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.
Based on the ecp package, the number of changepoints for the the median filtered data for participant 14 is equal to: 3. These changepoints are located at: 1801, 2041, 4801. The threshold based changepoint in the subjective RPE is located at: 3600. In addition, the number of changepoints based on the RPE values for this subject is equal to: 2. These changepoints are located at: 3600, 7800. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 4. These changepoints are located at: 0, 3000, 5400, 7800. The results from the analysis above can be found at: ChangepointsRPE.RData.
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 4.1.3. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.
Based on the ecp package, the number of changepoints for the the median filtered data for participant 15 is equal to: 4. These changepoints are located at: 1, 4081, 6361, 6961. The threshold based changepoint in the subjective RPE is located at: 3600. In addition, the number of changepoints based on the RPE values for this subject is equal to: 2. These changepoints are located at: 3600, 7200. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 4. These changepoints are located at: 0, 3000, 5400, 7800. The results from the analysis above can be found at: ChangepointsRPE.RData.
save(changepoints_subj_ecp, changepoints_subj_thr, changepoints_cusum_subj_ecp,
file = "./Data/RGenerated/ChangepointsRPE.RData")
In this section, we perform the following tasks:
EDA, where we overlay the subjective ratings with the CUSUM changepoints;
A threshold based comparison, where we use \(\small RPE \ge 13\) to identify when a participant becomes fatigued and we compare that to the changepoints detected by the hybrid CUSUM-EPC approach; and
Utilize the e.divisive() from the ecp package to obtain the changepoints for the Borg Ratings. Note that this function is the univariate approach for the multiple changepoint method examined in Section 4.1.
RatingThresh <- 13 # based on Maman et al. 2017
RPE = read.csv("https://raw.githubusercontent.com/fmegahed/fatigue-changepoint/master/Data/Raw/RPE.csv") # to read subjective rating of RPE data from GitHub'
colnames(RPE)[1] <- "time.from.start"
df_RPE_transformed <- melt(RPE,
id.vars = "time.from.start",
measure.vars = colnames(RPE)[2:16])
df_RPE_transformed[,2] <- paste0(df_RPE_transformed[,2],".RPE")
# to load previously generated data and changepoints
load(file = "./Data/RGenerated/CusumData.RData")
load(file = "./Data/RGenerated/ECPChangePointsCUSUM.RData")
changepoints_subj_ecp <- list()
changepoints_subj_thr <- list()
changepoints_cusum_subj_ecp <- list()
for (i in 1:15) {
cumsum_RPE <- data.frame(matrix(ncol=3, nrow=19)) # need to initialize every time
colnames(cumsum_RPE) <- c("time.from.start","variable","value")
correct_subject_rows <- which(df_RPE_transformed[,2]==paste0("Subject.",i,".RPE"))
dataholder <- df_RPE_transformed[correct_subject_rows,]
# Computing the CUSUMs of the RPEs
cumsum_RPE[,1] <- dataholder[,1]
cumsum_RPE[,2] <- paste0(dataholder[,2],".CUSUM")
cumsum_RPE[,3] <- cumsum(dataholder[,3]-13)
# -------- Calculation of Fatigue based on RPE>= 13 -----------------
fatigue_from_RPE_threshold <- which(RPE[,i+1]>=RatingThresh) %>%
min() # returns window number
changepoints_subj_thr[[i]] <- RPE[fatigue_from_RPE_threshold,1]
df_RPE<-data.frame(d1=RPE[1],t1=RPE[i+1]) # to make dataframe of subjective ratings for ggplot
#------- Estimation of Fatigue Changepoints based on ECP of the RPE Data --
y <- e.divisive(df_RPE,min.size = 2)$estimates
fatigue_from_ecp_changepoint = y[2:(length(y)-1)] # removing the first and last points as trivials changepoints
changepoints_subj_ecp[[i]] <- df_RPE[fatigue_from_ecp_changepoint,1]
# ------ Repeating the above analysis for the CUSUMs of the RPE --------
y_cumsum <- e.divisive(cumsum_RPE[,c(1,3)],min.size = 2)$estimates
fatigue_ecp_CusumRPE <- y_cumsum[2:length(y_cumsum)-1]
changepoints_cusum_subj_ecp[[i]] <- cumsum_RPE[fatigue_ecp_CusumRPE, 1]
cat('###',paste0("P", i), "{-}",'\n')
assign(paste0("g",i),
ggplot(data = df_transformed,
aes(x=time.from.start, y=value, colour = variable)) +
geom_line(size=2) +
theme_bw() +
scale_colour_manual(values = c("red", "green", "blue","black","grey40")) +
facet_wrap(~variable,nrow=5, scales = "free_y") +
# labs(title = paste("A paneled time-series plot for Participant",i)) +
geom_vline(xintercept= changepoints_cusum[[i]],
show.legend = T, size=2) +
geom_vline(xintercept= changepoints_subj_thr[[i]],
linetype="solid", color="steelblue1",
show.legend = T, size=2) +
geom_vline(xintercept= changepoints_subj_ecp[[i]],
linetype="dotted", color="gray80",
show.legend =T, size=2) +
geom_vline(xintercept= changepoints_cusum_subj_ecp[[i]],
linetype="dashed", color="orange1",
show.legend =T, size=3) +
theme(legend.position="none",
plot.title = element_text(hjust = 0.5))
)
print(get(paste0("g",i)))
cat('\n \n')
cat(paste0('<source> <p> In the above figure, the colored vertical lines correspond to the following. First, the <b> black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. </b> These changepoints are what we obtained in Section 4.1.3. Second, the <font color= #63B8FF> <b>steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. </b> </font> Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the <b> <font color=#CCCCCC> gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. </font> </b>',
'. Fourth, the <b> <font color=#FFA500> orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values. </font> </b> \n \n'))
cat(paste0('<source> <p> Based on the <b>ecp package, the number of changepoints for the cusums of the median filtered data for participant ', i,
' is equal to: ',
length(changepoints_cusum[[i]]),
'. These changepoints are located at: ',
paste(changepoints_cusum[[i]], collapse = ", "),
'. <font color= #63B8FF> The threshold based changepoint in the subjective RPE is located at: ',
paste(changepoints_subj_thr[[i]]),
'. </font> </b> In addition, <b> <font color=#CCCCCC> the number of changepoints based on the RPE values for this subject is equal to: ',
length(changepoints_subj_ecp[[i]]),
". These changepoints are located at: ",
paste(changepoints_subj_ecp[[i]], collapse = ", "),
". </font> </b> Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. <b> <font color=#FFA500> The number of these changepoints is equal to: ",
length(changepoints_cusum_subj_ecp[[i]]),
". These changepoints are located at: ",
paste(changepoints_cusum_subj_ecp[[i]], collapse = ", "),
'. </b> </font> Note that the results from the analysis above can be found at: <a href="https://github.com/fmegahed/fatigue-changepoint/blob/master/Data/RGenerated/ChangepointsRPE.RData">ChangepointsRPE.RData</a>. This is the same link as that provided in the previous section since the RPEresults are identical. The change here was overlaying the RPE results with the CUSUM of the median filtered data and their corresponding changepoints from the ECP package (instead of directly with the median filtered data and their changepoints [also from the ECP package]). </p> </source>')
)
cat('\n \n')
}
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 4.1.3. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.
Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 1 is equal to: 3. These changepoints are located at: 1, 3481, 4801. The threshold based changepoint in the subjective RPE is located at: 6600. In addition, the number of changepoints based on the RPE values for this subject is equal to: 1. These changepoints are located at: 6600. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 4. These changepoints are located at: 0, 3600, 6000, 8400. Note that the results from the analysis above can be found at: ChangepointsRPE.RData. This is the same link as that provided in the previous section since the RPEresults are identical. The change here was overlaying the RPE results with the CUSUM of the median filtered data and their corresponding changepoints from the ECP package (instead of directly with the median filtered data and their changepoints [also from the ECP package]).
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 4.1.3. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.
Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 2 is equal to: 2. These changepoints are located at: 2641, 4201. The threshold based changepoint in the subjective RPE is located at: 6000. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 3600, 6000, 7800. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 4. These changepoints are located at: 0, 3600, 6000, 8400. Note that the results from the analysis above can be found at: ChangepointsRPE.RData. This is the same link as that provided in the previous section since the RPEresults are identical. The change here was overlaying the RPE results with the CUSUM of the median filtered data and their corresponding changepoints from the ECP package (instead of directly with the median filtered data and their changepoints [also from the ECP package]).
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 4.1.3. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.
Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 3 is equal to: 3. These changepoints are located at: 601, 1201, 1561. The threshold based changepoint in the subjective RPE is located at: 4200. In addition, the number of changepoints based on the RPE values for this subject is equal to: 2. These changepoints are located at: 3600, 7200. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 4. These changepoints are located at: 0, 3000, 5400, 7800. Note that the results from the analysis above can be found at: ChangepointsRPE.RData. This is the same link as that provided in the previous section since the RPEresults are identical. The change here was overlaying the RPE results with the CUSUM of the median filtered data and their corresponding changepoints from the ECP package (instead of directly with the median filtered data and their changepoints [also from the ECP package]).
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 4.1.3. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.
Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 4 is equal to: 4. These changepoints are located at: 721, 3721, 5161, 7441. The threshold based changepoint in the subjective RPE is located at: 3000. In addition, the number of changepoints based on the RPE values for this subject is equal to: 2. These changepoints are located at: 3600, 7200. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 3. These changepoints are located at: 0, 3600, 7200. Note that the results from the analysis above can be found at: ChangepointsRPE.RData. This is the same link as that provided in the previous section since the RPEresults are identical. The change here was overlaying the RPE results with the CUSUM of the median filtered data and their corresponding changepoints from the ECP package (instead of directly with the median filtered data and their changepoints [also from the ECP package]).
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 4.1.3. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.
Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 5 is equal to: 2. These changepoints are located at: 841, 1921. The threshold based changepoint in the subjective RPE is located at: 2400. In addition, the number of changepoints based on the RPE values for this subject is equal to: 2. These changepoints are located at: 4800, 7800. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 4. These changepoints are located at: 0, 2400, 4800, 7800. Note that the results from the analysis above can be found at: ChangepointsRPE.RData. This is the same link as that provided in the previous section since the RPEresults are identical. The change here was overlaying the RPE results with the CUSUM of the median filtered data and their corresponding changepoints from the ECP package (instead of directly with the median filtered data and their changepoints [also from the ECP package]).
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 4.1.3. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.
Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 6 is equal to: 2. These changepoints are located at: 121, 4561. The threshold based changepoint in the subjective RPE is located at: 8400. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 2400, 4800, 7800. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 4. These changepoints are located at: 0, 3600, 6600, 9000. Note that the results from the analysis above can be found at: ChangepointsRPE.RData. This is the same link as that provided in the previous section since the RPEresults are identical. The change here was overlaying the RPE results with the CUSUM of the median filtered data and their corresponding changepoints from the ECP package (instead of directly with the median filtered data and their changepoints [also from the ECP package]).
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 4.1.3. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.
Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 7 is equal to: 4. These changepoints are located at: 1, 721, 1801, 3481. The threshold based changepoint in the subjective RPE is located at: 1800. In addition, the number of changepoints based on the RPE values for this subject is equal to: 2. These changepoints are located at: 3600, 7200. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 3. These changepoints are located at: 0, 3600, 7200. Note that the results from the analysis above can be found at: ChangepointsRPE.RData. This is the same link as that provided in the previous section since the RPEresults are identical. The change here was overlaying the RPE results with the CUSUM of the median filtered data and their corresponding changepoints from the ECP package (instead of directly with the median filtered data and their changepoints [also from the ECP package]).
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 4.1.3. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.
Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 8 is equal to: 5. These changepoints are located at: 1, 1561, 3481, 4561, 7441. The threshold based changepoint in the subjective RPE is located at: 6600. In addition, the number of changepoints based on the RPE values for this subject is equal to: 1. These changepoints are located at: 4800. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 4. These changepoints are located at: 0, 3600, 6000, 8400. Note that the results from the analysis above can be found at: ChangepointsRPE.RData. This is the same link as that provided in the previous section since the RPEresults are identical. The change here was overlaying the RPE results with the CUSUM of the median filtered data and their corresponding changepoints from the ECP package (instead of directly with the median filtered data and their changepoints [also from the ECP package]).
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 4.1.3. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.
Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 9 is equal to: 2. These changepoints are located at: 3361, 5281. The threshold based changepoint in the subjective RPE is located at: 3000. In addition, the number of changepoints based on the RPE values for this subject is equal to: 2. These changepoints are located at: 4200, 7800. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 4. These changepoints are located at: 0, 2400, 4800, 7800. Note that the results from the analysis above can be found at: ChangepointsRPE.RData. This is the same link as that provided in the previous section since the RPEresults are identical. The change here was overlaying the RPE results with the CUSUM of the median filtered data and their corresponding changepoints from the ECP package (instead of directly with the median filtered data and their changepoints [also from the ECP package]).
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 4.1.3. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.
Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 10 is equal to: 2. These changepoints are located at: 961, 5521. The threshold based changepoint in the subjective RPE is located at: 2400. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 2400, 4800, 7800. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 3. These changepoints are located at: 0, 4800, 7800. Note that the results from the analysis above can be found at: ChangepointsRPE.RData. This is the same link as that provided in the previous section since the RPEresults are identical. The change here was overlaying the RPE results with the CUSUM of the median filtered data and their corresponding changepoints from the ECP package (instead of directly with the median filtered data and their changepoints [also from the ECP package]).
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 4.1.3. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.
Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 11 is equal to: 2. These changepoints are located at: 1081, 6361. The threshold based changepoint in the subjective RPE is located at: 10800. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 2400, 6000, 7800. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 4. These changepoints are located at: 0, 3600, 6600, 9000. Note that the results from the analysis above can be found at: ChangepointsRPE.RData. This is the same link as that provided in the previous section since the RPEresults are identical. The change here was overlaying the RPE results with the CUSUM of the median filtered data and their corresponding changepoints from the ECP package (instead of directly with the median filtered data and their changepoints [also from the ECP package]).
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 4.1.3. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.
Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 12 is equal to: 4. These changepoints are located at: 2281, 3241, 5521, 6121. The threshold based changepoint in the subjective RPE is located at: 9000. In addition, the number of changepoints based on the RPE values for this subject is equal to: 2. These changepoints are located at: 4200, 7800. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 4. These changepoints are located at: 0, 3600, 6600, 9000. Note that the results from the analysis above can be found at: ChangepointsRPE.RData. This is the same link as that provided in the previous section since the RPEresults are identical. The change here was overlaying the RPE results with the CUSUM of the median filtered data and their corresponding changepoints from the ECP package (instead of directly with the median filtered data and their changepoints [also from the ECP package]).
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 4.1.3. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.
Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 13 is equal to: 4. These changepoints are located at: 361, 1441, 3841, 5761. The threshold based changepoint in the subjective RPE is located at: 6600. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 3600, 6000, 8400. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 3. These changepoints are located at: 0, 3600, 7200. Note that the results from the analysis above can be found at: ChangepointsRPE.RData. This is the same link as that provided in the previous section since the RPEresults are identical. The change here was overlaying the RPE results with the CUSUM of the median filtered data and their corresponding changepoints from the ECP package (instead of directly with the median filtered data and their changepoints [also from the ECP package]).
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 4.1.3. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.
Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 14 is equal to: 3. These changepoints are located at: 1081, 1921, 3721. The threshold based changepoint in the subjective RPE is located at: 3600. In addition, the number of changepoints based on the RPE values for this subject is equal to: 2. These changepoints are located at: 3600, 7800. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 4. These changepoints are located at: 0, 3000, 5400, 7800. Note that the results from the analysis above can be found at: ChangepointsRPE.RData. This is the same link as that provided in the previous section since the RPEresults are identical. The change here was overlaying the RPE results with the CUSUM of the median filtered data and their corresponding changepoints from the ECP package (instead of directly with the median filtered data and their changepoints [also from the ECP package]).
In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 4.1.3. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.
Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 15 is equal to: 2. These changepoints are located at: 1441, 5281. The threshold based changepoint in the subjective RPE is located at: 3600. In addition, the number of changepoints based on the RPE values for this subject is equal to: 2. These changepoints are located at: 3600, 7200. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 4. These changepoints are located at: 0, 3000, 5400, 7800. Note that the results from the analysis above can be found at: ChangepointsRPE.RData. This is the same link as that provided in the previous section since the RPEresults are identical. The change here was overlaying the RPE results with the CUSUM of the median filtered data and their corresponding changepoints from the ECP package (instead of directly with the median filtered data and their changepoints [also from the ECP package]).
save(changepoints_subj_thr,changepoints_subj_ecp, changepoints_cusum_subj_ecp,
file = "./Data/RGenerated/ChangepointsRPE.RData")
From B and C, we are hoping to develop a general diagnosis/understanding of how fatigue develops (e.g., we start with a ramp up phase, then steady state, then fatigue starting, and then person is totally fatigued).
Department of Mechanical and Aerospace Engineering, University at Buffalo. ↩
Department of Industrial and Systems Engineering, University at Buffalo. ↩
Farmer School of Business, Miami University. ↩
Farmer School of Business, Miami University. Corresponding author. fmegahed@miamioh.edu.↩
5.3 Comments on the ECP and dfphase1 Methods
Based on the results in Sections 4.1 and 4.2, the approach of Matterson and James (2016) is preferred since it:
Results in a more limited number of changepoints, highlighting more (practically) significant changes in the gait parameters. This result was somewhat surprising since we used the default settings for the approach of Capizzi and Masarotto (2016) throught the default mphase1(), which prevents having changepoints within 5 windows; and
Is more computationally efficient. It had a shorter run time than the other approach.
In addition, from the analysis, the changepoints based on the CUSUM of the median filtered data are the most insightful. Specifically, one can discern how each participant’s performance change as the experiment goes on. For example, P10 has a clear fatigued performance in the end, where the cusums of the stride length and stride height are decreasing rapidly after stride 5521.